Blog / Community

January 2025 newsletter

author avatar
ClickHouse Team
Jan 19, 2025 - 8 minutes read

Welcome to the first ClickHouse newsletter of 2025. This month, we have Apache Iceberg REST catalog and schema evolution in the 24.12 release. We learn how to build a product analytics solution and implement the Medallion architecture with ClickHouse. And we have videos from The All Things Open Conference!

 

Inside this issue

 

This month's featured community member is Jason Anderson, Head of Data at Skoo, a community platform.

featured-member-202501.png

Jason Anderson is an experienced data and technology professional with a background in leading teams and developing data-driven solutions. He previously served as Head of Data at Mythical Games and Partner at Comp Three, focusing on machine learning, analytics, and cloud architecture. His career includes roles at IBM and PolySat, where he contributed to cloud services and satellite software development.

Jason recently presented his work at Skool at the ClickHouse Los Angeles meetup. Jason explained how they moved from Postgres to ClickHouse to process 100M+ rows daily while delivering lightning-fast queries. There is also a blog post that explains Skool’s use of ClickHouse in more detail.

Follow Jason on LinkedIn


Upcoming events

Global events

Free training

Events in EMEA

Events in APAC

 

24.12 release

release-24.12.png

The final release in 2024 introduced support for the Iceberg REST catalog and schema evolution. Daniel Weeks, co-creator of Apache Iceberg, made a guest appearance in the 24.12 community call, so be sure to check out the recording.

There were also Enum usability improvements, an experimental feature to sort a table by a column in reverse order, JSON subcolumns as a table’s primary key, automatic JOIN reordering, optimization of JOIN expressions, and more!

Read the release post

 

Building a product analytics solution with ClickHouse

building-product-analytics-solution.png

Product analytics involves collecting, analyzing, and interpreting data on how users interact with a product.

Chloé Carasso leads product analytics at ClickHouse and wrote a blog post explaining how we built our in-house product analytics platform.

Chloe explains why we decided to build something ourselves rather than buy an off-the-shelf solution and shares some ideas on designing and operating a ClickHouse-powered analytics solution if you're interested in this path. She also shares common queries that she runs, including cohort analysis, user paths, and measuring retention/churn.

Read the blog post

 

Optimizing bulk inserts for partitioned tables

optimizing-bulk-inserts.png

Jesse Grodman, a Software Engineer at Triple Whale, shares some tips for quickly loading data into a highly partitioned ClickHouse table.

We start writing data directly into the table from S3 files, but that results in many small parts, which isn’t ideal from a querying standpoint and can result in the too many parts error. He explores various ways to work around this problem, including sorting the data by partition key as part of the ingestion query, which results in an out-of-memory error.

Jesse discovers that sorting the data by partition key before writing it into ClickHouse works much better. He also tries first loading the data into an unpartitioned table and then populating the partitioned table afterward, doing the sorting in ClickHouse.

Read the blog post

 

From zero to scale: Langfuse's infrastructure evolution

from-zero-to-scale.png

is an open-source LLM observability platform that participated in the Y Combinator Winter 2023 batch. The initial release of their product was written on Next.js, Vercel, and Postgres. This allowed them to get the release out of the door quickly, but they ran into problems when trying to scale the system.

In the blog post, they explain their journey to solving these problems, which involved an extensive infrastructure redesign. A Redis queue was introduced to handle spiky ingestion traffic, and they sped up analytics queries with help from a ClickHouse ReplacingMergeTree table.

Read the blog post

 

Building a Medallion architecture with ClickHouse

building-medallion-ch.png

The medallion architecture is a data design pattern that logically organizes data in a lakehouse. It aims to incrementally and progressively improve the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables).

The ClickHouse Product Marketing Engineering (PME) team was curious whether the architecture could be applied to a real-time data warehouse like ClickHouse and wrote a blog post describing their experience.

Read the blog post

 

Building a Medallion architecture for Bluesky data

building-medallion-bluesky.png

Following the introductory post to the Medallion architecture, the ClickHouse PME team applied this design pattern to data from the BlueSky social network.

This was a perfect dataset for this experiment, as many of the records had malformed or incorrect timestamps. The dataset also contained frequent duplicates.

The blog goes through a workflow that addresses these challenges, organizing this dataset into the Medallion architecture’s three distinct tiers: Bronze, Silver, and Gold. The team also heavily uses the recently released JSON type.

Read the blog post

 

Quick reads

 

Video corner

 

Post of the month

Our favorite post this month was by Dmytro Shevchenko:

post-of-month-202501.png

Read the post

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
X imageSlack imageGitHub image
Telegram imageMeetup imageRss image