January 2026 newsletter

Hello, and welcome to the January 2026 ClickHouse newsletter!

This month, we learn how chDB achieved true zero-copy integration with Pandas DataFrames, how WKRP migrated their RuneScape tracking plugin from TimescaleDB to ClickHouse, replacing Apache Flink with ClickHouse's Kafka engine, and more!

Featured community member: lgbo #

This month's featured community member is lgbo.

lgbo works at BIGO, where they use ClickHouse in their real-time data pipeline that processes tens of billions of messages daily.

lgbo has submitted several pull requests to address performance issues, including reducing memory usage for window functions, reducing cache misses during hash table iteration, and optimizing CROSS JOINs.

lgbo also improved short-circuit execution performance by avoiding unnecessary operations on non-function columns, added a new stringCompare function for lexicographic comparison of substring portions, and fixed a bug where named tuple element names weren't preserved correctly during type derivation.

➡️ Follow lgbo on GitHub

25.12 release #

ClickHouse 25.12 delivers significant improvements in query performance across the board. We have faster top-N queries through data skipping indexes, a reimagined lazy reading execution model that's 75 times faster, and a more powerful DPsize join reordering algorithm.

➡️ Read the release post

The Journey to Zero-Copy: How chDB Became the Fastest SQL Engine on Pandas DataFrame #

chDB v4.0 achieves true zero-copy integration with Pandas DataFrames. By eliminating serialization steps and implementing direct memory sharing between ClickHouse and NumPy, queries that previously took 30 seconds now complete in under a second.

➡️ Read the blog post

A small-time review of ClickHouse #

WKRP migrated their RuneScape tracking plugin from TimescaleDB to ClickHouse with impressive results. Storage usage improved dramatically: location data compressed from 4.28 GiB to 592 MiB (87% reduction), while XP tracking data went from 872 MiB to 168 MiB (81% reduction). Beyond storage, the migration simplified operations - upgrades now happen through the package manager without coordinated downtime.

The verdict: "Timescale worked well, but ClickHouse has provided better performance and made running the service easier."

➡️ Read the blog post

Solving the "Impossible" in ClickHouse: Advent of Code 2025 #

Yes, we're still talking about Christmas in mid-January, but Zach Naimon's deep dive into solving Advent of Code 2025 entirely in ClickHouse SQL is worth the delayed celebration.

Following strict rules (pure SQL only, raw inputs, single queries), Zach tackled all 12 algorithmic puzzles that typically require Python, Rust, or C++. The solutions showcase ClickHouse's versatility through recursive CTEs for pathfinding, arrayFold for state machines, and specialized functions like intervalLengthSum for geometric problems.

Proof that with the right tools, "impossible" problems become just another data challenge.

➡️ Read the blog post

Seven Companies, One Pattern: Why Every Scaled ClickHouse Deployment Looks the Same #

Luke Reilly explains why Uber, Cloudflare, Instacart, GitLab, Lyft, Microsoft, and Contentsquare all build the same four-layer abstraction stack over ClickHouse: it changes cost curves.

Platform teams absorb schema optimization knowledge once through semantic layers and query translation engines, enabling sublinear scaling - headcount grows with data volume instead of user count.

Now AI is becoming the fifth layer, with tools like ClickHouse.ai and MCP servers adding natural language interfaces on top of these semantic definitions.

➡️ Read the blog post

Your AI SRE needs better observability, not bigger models #

Drawing on his experience increasing Confluent's availability from 99.9% to 99.95%, Manveer Chawla explains why AI SRE copilots should prioritize investigation over auto-remediation. Most AI SRE tools fail because they're built on observability platforms with short retention, dropped high-cardinality dimensions, and slow queries.

The solution? Rethink the observability architecture. Manveer details a reference architecture using ClickHouse that demonstrates the effectiveness of AI copilots, which require better data foundations, rather than larger models.

➡️ Read the blog post

Simplifying real-time data pipelines: How ClickHouse replaced Flink for our Kafka Streams #

Ashkan Goleh Pour provides a detailed walkthrough of replacing Apache Flink with ClickHouse's native Kafka integration for real-time streaming.

The architecture uses ClickHouse's Kafka table engine to consume events directly, Materialized Views for continuous SQL transformations, and MergeTree tables for persistent state, thereby eliminating the need for external stream processors.

➡️ Read the blog post

Quick reads #

Georgii Baturin has written a multi-part series of posts showing how to use dbt with ClickHouse.
Shuva Jyoti Kar demonstrates how to build an autonomous AI agent system by connecting ClickHouse with Google's Gemini CLI using the Model Context Protocol (MCP).
Gulled Hayder shows how to set up ClickHouse on a Linux system with Python, install and configure the database with proper authentication, generate 1 million rows of synthetic web traffic data, and load it into a MergeTree table for analytics.
ByteBoss builds a real-time cryptocurrency market data pipeline that connects to exchange WebSockets (Binance/Coinbase), normalizes their different data formats, streams through Kafka/Redpanda, automatically processes with ClickHouse materialized views, and visualizes live trading data in Grafana dashboards.

Interesting projects #

DoomHouse - An experimental "Doom-like" game engine that renders the 3D graphics entirely in ClickHouse SQL.
genezhang/clickgraph - Stateless, read-only graph query engine for ClickHouse using Cypher.
clickspectre - A spectral ClickHouse analyzer that tracks which tables are actually used and by whom.