Welcome to the September ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month. This month, we have the much-awaited JSON data type, our 1st ClickHouse research paper, a Private Preview of BYOC on AWS, better PyPi stats with Ibis, and more!

Featured community member #

beehiiv is a newsletter platform that helps creators, publishers, and businesses build and grow their email audiences. They collect events capturing every time an email is processed, every time it lands in an inbox, every time it’s deferred, every time it’s bounced, every time you open it, every time you click a link, and so on.

Eric has worked at beehiv for just over a year and was responsible for moving data operations from Postgres to ClickHouse Cloud. There’s a user story on the work he and his team did, and he also presented at the New York meetup in the summer.

Eric previously worked as a Tech Lead at Arthur.ai, where he architected and built the company's data ingestion pipeline, storage, and much of the backend infrastructure.

VLDB 2024: First ClickHouse research paper #

It’s been almost a year in the making, and at the end of August, we presented our first research paper at VLDB 2024.

VLDB—the international conference on very large databases—is widely regarded as one of the leading conferences in data management. VLDB generally has an acceptance rate of ~20% among the hundreds of submissions.

The paper concisely describes ClickHouse's most interesting architectural and system design components, which make it so fast. We’ve embedded the PDF of the paper in the blog post linked below.

How Reco leverages advanced analytics to detect sophisticated SaaS threats #

Reco is a full-lifecycle SaaS security solution that uses ClickHouse as the foundation of its advanced analytics system. Nir Barak explains how ClickHouse gives them a holistic view of data across multiple layers and allows them to detect outliers and anomalies.

24.8 LTS release #

The 24.8 release is here, and it has an exciting feature that I (and many of you) have been waiting for - the new JSON data type!

It’s in experimental mode, but that didn’t stop us from taking it through its paces while exploring structured data of events in football/soccer matches.

This release also introduces the TimeSeries table engine, which can store Prometheus data, and a new Kafka table engine that supports exactly-once event processing.

Better PyPI stats with Ibis, ClickHouse, and Shiny #

ClickPy is a ClickHouse-backed application that analyzes the download of Python packages published on PyPI. In addition to the front-end application, you can also query the underlying data, which is exactly what Cody Peterson has done.

Cody shows how to connect to ClickPy using Ibis and then explores the seasonality of downloads of the clickhouse-connect package by day of the week and month. The results are visualized using plot.ly, and Cody then puts everything together into a Shiny application.

ClickHouse Cloud: BYOC AWS in Private Preview #

ClickHouse Cloud has been running for almost two years and supports all the major cloud platforms, AWS, Azure, and GCP. So far, it’s been a SaaS offering that runs entirely on ClickHouse’s cloud account, which made it a non-starter for users with strict data residency and compliance requirements.

We’re therefore happy to announce the Private Preview release of Bring Your Own Cloud (BYOC) on AWS. BYOC is a fully managed ClickHouse Cloud service deployed to your AWS account.

The waiting list is now open, so be sure to sign up, and we’ll contact you to set you up.

Quick reads #

Heng Ma shows how to build a system that enriches shopping cart events with product details. Using Rising Wave, a Kafka event data stream is joined with a product catalog, and the enriched events are written to ClickHouse using the Rising Wave-ClickHouse connector.

Auxten released a new version of chDB, the in-process embedded version of ClickHouse, that can query Pandas DataFrames 87 times faster than the initial version.

I loved this video from Jess Archer’s talk at Laracon US 2024. It is an excellent introduction to ClickHouse and shows where it’s better than MySQL.

Sai Srirampur shares his tips for ClickHouse data modeling aimed at Postgres users. He explains various strategies to handle duplicates when using the ReplacingMergeTree table engine, how to handle null values, and the importance of ordering keys



Post of the month #

