May 2025 Newsletter

Hello, and welcome to the May 2025 ClickHouse newsletter!

This month, we have a deep dive into how ClickHouse has become "lazier", why the Microsoft Clarity analytics platform chose ClickHouse, an MCP/Real-Time Analytics panel, viewer retention metrics with ClickHouse, and more!

This month's featured community member is Can Tian, Senior Data Platform Engineer at DeepL.

Can Tian has a background in building scalable, cloud-native data systems using Python, C++, and modern infrastructure tools. With experience across DeepL, FactoryPal, and Springer Nature, he has worked on everything from data engineering to analytics and platform design.

Can has made impactful contributions to dbt-clickhouse, including adding support for incremental “microbatch” strategies, implementing schema change handling for distributed incremental models, and fixing critical issues related to ON CLUSTER statements in replicated databases.

➡️ Follow Can on LinkedIn

It’s only two weeks until Open House, the ClickHouse User Conference in San Francisco on May 29, and the impressive lineup of speakers continues to grow.

Lyft engineers Jeana Choi and Ritesh Varyani will explain how they use ClickHouse for near-real-time and sub-second analytics, enabling swift decision-making.

v25.5 Community Call - May 22

ClickHouse FastTrack Training - Amsterdam - May 12
ClickHouse Observability Training - Amsterdam - May 13
ClickHouse Fundamentals Training - Virtual - May 14
ClickHouse Developer FastTrack Training - Munich - May 14
ClickHouse Developer Training - Virtual - May 21
ClickHouse Fundamentals - Virtual - May 20, May 22, June 11
ClickHouse Developer Training - Virtual - May 21-22
In-Person ClickHouse Query Optimization Workshop - San Francisco - May 28
In-Person ClickHouse Developer Training Full Day San Francisco - May 28
Integrating your Data Lake with ClickHouse - Virtual - June 5

ClickHouse Meetup in Austin - May 13
Microsoft Build - Seattle - May 19-21
ClickHouse Meetup in Seattle - May 20
AWS Summit Washington D.C. - June 10-11
ClickHouse Meetup in Washington D.C. - June 12
Confluent’s Financial Services Leaders Summit, New York - June 10
ClickHouse Meetup in Atlanta - July 8
ClickHouse Meetup in New York - July 15
AWS Summit Toronto - September 4
AWS Summit Los Angeles - September 17

Munich Happy Hour - May 14
AWS Summit Dubai - May 21
AWS Summit Tel Aviv - May 28
AWS Summit Stockholm - June 4
AWS Summit Hamburg - June 5
AWS Summit Madrid - June 11
Tech BBQ Copenhagen - August 27-28
AWS Summit Zurich - September 11
BigData London - September 24-25
PyData Amsterdam - September 24-25

DevOpsDays Singapore - May 15
Data Engineering Summit, Bengaluru - May 15-16
ClickHouse Meetup in Shenzhen - May 17
AWS Summit Singapore - May 29
AWS Summit Sydney - June 4-5
Tokyo Meetup - AI Night! - June 5
KubeCon + CloudNativeCon Japan - June 16-17
AWS Summit Japan - June 25-26

It’s difficult to pick my favorite feature in the 25.4 release, but if I must, I’d go for lazy materialization. This optimization defers reading column data until needed, resulting in much faster queries. More on that in the next section!

MergeTree tables on read-only disks can now refresh their state and load new data parts, which effectively lets us create a ClickHouse-native data lake. Also included in this release is CPU slot scheduling, which lets you cap the number of threads running concurrently for a given workload.

Finally, there’s a nice quality-of-life update in clickhouse-local: tables in the default database persist!

➡️ Read the release post

The lazy materialization functionality has been given the Tom Schreiber treatment, i.e., a super in-depth article breaking down how it works and the use cases it will help with.

Tom starts with ClickHouse’s existing building blocks of I/O efficiency and runs a real-world query through them, layer by layer, until lazy materialization kicks in and dramatically optimizes performance.

➡️ Read the blog post

Microsoft Clarity is a free analytics tool that helps website and app owners understand user interactions through visual snapshots and user interaction data. It provides heatmaps, session recordings, and insights.

When Microsoft decided to offer Clarity as a free public service, it needed to revamp its infrastructure. The original proof-of-concept using Elasticsearch and Spark couldn't handle the anticipated scale of millions of projects and hundreds of trillions of events. The system was slow, had low ingestion throughput, and would be prohibitively expensive at scale.

They turned to ClickHouse as a solution, and in the blog, they describe why they made that choice, what problems it has helped solve, and the challenges they encountered along the way.

➡️ Read the blog post

Dmitry Pavlov announced AgentHouse, a chat-based demo environment where you can interact with ClickHouse datasets using the Claude Sonnet Large Language Model.

It uses LibreChat under the covers, which means that you can get not only text answers to your questions, but also interactive charts.

➡️ Read the blog post

CloudQuery faced a challenge with ClickHouse when ingesting large batches of data, sometimes exceeding 25 million records per operation. These massive inserts caused out-of-memory errors because ClickHouse materializes the entire dataset in memory before spilling to disk.

To solve this problem, they developed an “Insert-Splitter” algorithm that breaks up large inserts into smaller, manageable chunks based on UUID ranges. This approach required careful implementation due to ClickHouse's UUID sorting behavior.

It worked well, though! Splitting a single 26-million-row insert into four balanced buckets reduced peak memory usage by 75% without sacrificing processing speed.

➡️ Read the blog post

We recently announced the private preview of the MySQL Change Data Capture (CDC) connector in ClickPipes.

This lets customers replicate their MySQL databases to ClickHouse Cloud in just a few clicks and leverage ClickHouse for blazing-fast analytics. It works for continuous replication and one-time migration from MySQL, no matter where it's running.

➡️ Read the blog post

Bootstrapping with ClickHouse Apr 2025.webp

William Attache from AB Tasty wanted to speed up some statistical algorithms that use bootstrapping data by implementing them directly in ClickHouse SQL.

The blog walks us through his trial-and-error process with ClickHouse's native functions, explaining why initial random number strategies failed and how he eventually solved the problem using SQL-based workarounds and Python user-defined functions.

➡️ Read the blog post

This article fascinated me as a video creator. While view counts provide basic feedback, understanding viewer retention - the percentage of viewers still watching each moment -offers deeper insights into content performance.

Vimeo's blog post reveals how they've built a sophisticated retention analytics system using ClickHouse. Rather than storing absolute view counts, they track viewing patterns by recording changes (+1 when a viewer starts watching a segment, -1 when they stop) and use window functions to calculate cumulative views at each second.

They have also built an AI-powered insights layer, pre-processing retention data through window averaging and run-length encoding to prevent overwhelming the AI's context window. Combined with carefully crafted prompt engineering, they can generate concise, actionable insights about viewer engagement patterns.

➡️ Read the blog post

Gordon Chan, Staff Engineer at Buildkite (a scale-out delivery platform), shared their journey adopting ClickHouse for test analytics.
Prathamesh Sonpatki, Developer Evangelist at Last9, shared his insights on observability challenges and solutions from building an observability platform that uses ClickHouse under the hood.
Ryadh Dahimene hosted a panel discussion on Model Context Protocol (MCP) at the intersection of real-time analytics with experts from various companies. The participants included representatives from Anthropic, ClickHouse, RunReveal, Five One, and A16Z.
I created a video showing how to backfill materialized views on existing tables.
I also showed how to querying Apache Iceberg tables via the AWS Glue catalog.
Finally, we have a short video explaining ClickHouse’s JSON data type.

Blog / Community

May 2025 Newsletter

Featured community member: Can Tian

Upcoming events

Global events

Free training

Events in AMER

Events in EMEA

Events in APAC

25.4 release

ClickHouse gets lazier (and faster): Introducing lazy materialization

Why Microsoft Clarity chose ClickHouse

Introducing AgentHouse

How we handle billion-row ClickHouse inserts with UUID range bucketing

MySQL CDC connector for ClickPipes is now in Private Preview

Bootstrapping with ClickHouse

Vimeo: behind viewer retention analytics at scale

Video Corner

Subscribe to our newsletter

Recent posts

Products

Resources

Company

Join our community

Comparisons

Partners