Well, January went by quickly, didn’t it?! That means it must be time for our second newsletter of 2025.
This month's big news is the launch of JSONBench, a benchmark suite for JSON analytics. Ryadh Dahimene tells us about agent-facing analytics, Shahar Gvirtz explains why he likes ClickHouse, Tom Schreiber dives into the join improvements in 25.1, and more.
Featured community member: Chris Lawrence
This month's featured community member is Chris Lawrence, Dev Lead and Senior Software Engineer at AMP.
Chris previously co-founded ReSync Digital, successfully launching over 30 products for early-stage startups, and has experience in machine vision and IoT solutions through his work with Skip-Line, LLC.
Chris Lawrence spoke at the ClickHouse meetup in Melbourne in August 2024. He shared how AMP’s implementation of ClickHouse Cloud has helped them transform their data pipeline from batch processing to real-time streaming, improving their analytics platform's speed and reliability. Chris also elaborated on his talk in a recent blog post.
Upcoming events
Global events
- v25.2 Community Call - Feb 27
Free training
- ClickHouse Fundamentals - Feb 26 and Mar 19
- Formation ClickHouse en présentiel, Paris - Mar 4
- In-Person ClickHouse Developer Fast Track - Seattle - Mar 5
- ClickHouse Query Optimization Workshop - Mar 12
- ClickHouse Admin Workshop - Mar 12
- In-Person ClickHouse Developer - Sydney - Mar 24-25
- In-Person ClickHouse Developer - Melbourne - Mar 27-28
- In-Person ClickHouse Developer Fast Track - Bangalore - Apr 1
Events in AMER
- Clickhouse Meetup with LA DevOps - Feb 20
- ClickHouse Meetup in Seattle - Mar 5
- Scale 22x, Pasadena - Mar 6 - Mar 9
- Game Developers Conference, San Francisco - Mar 17
- ClickHouse Meetup @ Cloudflare, San Francisco - Mar 19
- ClickHouse Meetup @ Klaviyo, Boston - Mar 25
- ClickHouse Meetup @ Braze, New York - Mar 26
- Google Next, Las Vegas - Apr 9
- Open House User Conference, San Francisco - May 28
Events in EMEA
- ClickHouse Meetup @ Nexton, Paris - Mar 4
- KubeCon 2025, London - April 1-4
- AWS Summit 2025, Paris - April 9
- AWS Summit 2025, Amsterdam - April 16
- AWS Summit, 2025, London - April 30
Events in APAC
- ClickHouse Singapore Meetup - Feb 25
- ClickHouse Shanghai Meetup, China- Mar 1
- Data & AI Summit NSW, Australia - Mar 18
- Current Bengaluru, India - Mar 19
- ClickHouse Delhi Meetup, India - Mar 22
- Latency Conference, Australia - Apr 3-4
- TEAMZ Web3/AI Summit, Japan - Apr 16-17
Introducing JSONBench: The billion docs JSON Challenge vs MongoDB, Elasticsearch, and more
The November newsletter mentioned the new JSON data type and explained its performance benefits. To test these claims, we developed JSONBench, a benchmark suite for JSON analytics.
Tom Schreiber has published a comprehensive blog post comparing how different databases handle JSON data. The analysis covers performance benchmarks and storage approaches across multiple systems, including ClickHouse, MongoDB, and Elasticsearch.
His findings detail how each database performs with analytical queries on JSON data and explore their underlying JSON storage mechanisms.
Shahar Gvirtz: 7 Reasons why I like ClickHouse
It’s always fun to come across a blog post by a community member enjoying their time with ClickHouse!
I won’t go through all of Shahar’s reasons for liking ClickHouse, but I did want to highlight one of the things that he likes, which is an underrated feature of ClickHouse - its ability to compress data. In Shahar’s words:
Logs stored in ClickHouse take up only 28% of the space they occupy in Elasticsearch.
If you ever need to tell a friend or colleague why you like ClickHouse, you could do worse than point them to this blog post!
Agent-Facing Analytics
Ryadh Dahimene has written a (IMHO) brilliant blog post explaining a new user persona for real-time analytics databases - AI agents!
Ryadh first takes us on a brief tour of AI developments since the launch of ChatGPT in 2022, including the "sense-think-act" loop, the introduction of support for tools by LLMs, and the recent evolution of reasoning models like OpenAI o1 and DeepSeek-R1.
He then explores the role of real-time analytics databases in agentic workflows and introduces the ClickHouse MCP Server. This is our implementation of the server side of Anthropic’s Model Context Protocol, which means you can easily converse with a ClickHouse database from the Claude Desktop.
ClickHouse and Cribl: A Powerful Data Ingestion and Analysis Duo
Cribl Stream is a data processing platform that works with various data sources, including telemetry data, like logs, metrics, and trace data. It can preprocess, filter, and transform events before forwarding them to destinations, helping optimize storage utilization and query efficiency. Support for ClickHouse was recently added to its list of supported outputs.
David Maislin has written a detailed guide showing how to set up and use this integration. The guide includes step-by-step instructions for creating ClickHouse tables, configuring Cribl Stream destinations, and using Cribl Search to query the data. It also demonstrates how to use ClickHouse alongside Cribl's data processing features, complete with examples using Cribl's Datagen feature to generate test data.
ClickHouse Cloud evolution: compute-compute separation, improved autoscaling, and more!
ClickHouse Cloud was built in record time and brought to market in December 2022. Since then, over a thousand companies have onboarded their workloads into our managed service, and every day, they now collectively run 5.5 billion queries, scanning 3.5 quadrillion records on top of 100PB of data!
Over the past two years, we've gained valuable insights from working closely with our users and have significantly evolved our cloud architecture. This blog describes the latest improvements, including compute-compute separation, high-performance machine types (moving to Graviton in AWS), single-replica services, and more reactive and seamless automatic scaling.
25.1 release
In the 25.1 release blog post, Tom Schreiber did a deep dive into the improvements made to the parallel hash join algorithm probe phase. If you’re interested in database internals, that’s worth a read.
This release also introduced MinMax indices at the table level, improved the Merge table engine and table function, added auto-increment functionality, and some nice CLI usability improvements.
Interesting projects
While compiling the newsletter each month, I come across many ClickHouse-based projects, so I thought I’d share some of them this month.
- apitally.io - An API monitoring and analytics tool for Python / Node.js apps. It helps users understand API usage and performance, spot issues early, and troubleshoot effectively when something goes wrong. The founder mentioned that it uses ClickHouse to store data on a Hacker News thread.
- Openpanel - An open-source alternative to Mixpanel for capturing user behavior across web, mobile apps, and backend services. It uses ClickHouse to store events.
- Vigilant - A lightweight tool for managing structured logs. It lets you centralize your logs, search them, and create alerts. It uses ClickHouse under the hood.
- CH-UI - A user interface for interacting with the ClickHouse Server. It has syntax highlighting for queries and lets you see visual metrics about your instance.
Video Corner
- As Benjamin Wootton demonstrates in this hands-on video, splitting workloads between PostgreSQL for transactions and ClickHouse for analytics is becoming increasingly popular. He walks us through two ways to keep these databases in sync—using the open-source PeerDB tool or ClickHouse Cloud's built-in solution.
- I created a video showing how to use the recently released ClickHouse MCP server.
- I also created a video showing how to use the built-in monitoring dashboard to debug some common problems.
- Leon Kozlowski from Flock Safety explains how they transformed their traffic analytics system from a slow, daily-batch Redshift setup to a real-time solution using ClickHouse. The system handles over a billion ML predictions per day from its network of surveillance cameras.
- Derek Chia and Karthikayan Muthuramalingam present a technical overview of integrating ClickHouse with Kafka, showing how these technologies can work together effectively for real-time data processing and analytics. Derek explains ClickHouse's capabilities as an open-source columnar database optimized for analytics, while Karthikayan details Kafka's role as a distributed event streaming platform.
Post of the month
My favorite post this month was by Jacob Wolf, who’s ingesting lots of data into ClickHouse.