Going to re:Invent this December? Come party with us and The Chainsmokers ->->

Blog / Community

May 2024 Newsletter

author avatar
ClickHouse Team
May 23, 2024

Welcome to the May ClickHouse newsletter, which will round up what’s been happening in real-time data warehouses over the last month.

This month, we have recursive CTEs in the 24.4 release, the launch of ClickHouse developer certification, real-time fraud detection at Instacart, and more!

Inside this issue

This month's featured community member is Dan Goodman, Co-Founder and CEO of Tangia, a service for hosting interactive live streams.

featured-member-may2024.png

Dan has been part of the ClickHouse community for at least 18 months and frequently gives the engineering team feedback on both missing features and how existing features can be improved.

Dan writes a blog about distributed systems, where he’s previously written about topics like range partitioning and building a Fly.io scheduler.

A few weeks ago he wrote a blog post titled Finding Trends With Approximate Embedding Clustering. In the post, he explains the importance of approximation techniques when working with big datasets and walks through how to implement the Dynamic K-Means++ algorithm with ClickHouse.

Follow Dan on LinkedIn

 

Upcoming events

 

24.4 release

24.4-release.png

The standout feature in the 24.4 release is recursive CTEs (Common Table Expressions), and we made a London Underground-themed example to show you how they work. This release also sees improvements to JOIN performance and the QUALIFY clause to filter the values of window functions.

Read the release post

 

Become a ClickHouse Certified Developer

certified-developer.png

Rich Raposa recently announced the launch of the official ClickHouse Developer Certification Program, the only certification directly from ClickHouse.

This certification program validates developers’ proficiency in using ClickHouse to build robust, high-performance data solutions. This certification will showcase your mastery of ClickHouse and help you distinguish yourself as a trusted database management and analytics expert.

Learn more about certification

 

Real-time Fraud Detection at Instacart

instacart.png

Nick Shieh, Shen Zhu, and Xiaobing Xia have written a blog post where they walk us through Yoda, a decision platform service they built at Instacart to detect fraudulent activities and take action quickly. ClickHouse was chosen as the primary real-time datastore for this system because it can both ingest and query large amounts of data in real time. I especially liked the part of the post where they describe how real-time features fed into the service are derived from ClickHouse SQL queries.

Read the blog post

 

K-Means Clustering with ClickHouse

K-means clustering.png

Recently, when helping a user who wanted to compute centroids from vectors held in ClickHouse, we realized that the same solution could be used to implement K-Means clustering. They wanted to do this at scale across potentially billions of data points while ensuring memory could be tightly managed. In this post, we show how to implement K-means clustering using just SQL and show that it can scale to billions of records while running significantly faster than the same code in scikit-learn.

Read the blog post

 

Simplified Kubernetes Logging with Fluentbit and ClickHouse

Logging is one of the hot ClickHouse use cases of the moment, so I was excited to come across this blog post by Muthukumaran. Fluentbit is a lightweight logging and metrics processor and forwarder designed for containerized environments. Muthukumaran walks us through the steps to setup a metrics server to monitor resource utilization in Kubernetes and then shows how to configure Fluentbit to get those metrics into ClickHouse.

Read the blog post

 

The New Building Blocks of Observability

obs-building-blocks-3.png

This article focuses on what the author coins the three new elements in the observability period table: OpenTelemetry, eBPF, and ClickHouse. OpenTelemetry has emerged as the de facto standard for telemetry data, eBPF makes it possible to generate traces and metrics with zero instrumentation, and ClickHouse is used to ingest and query all this data. The article also covers a series of Observability startups that are using ClickHouse - Groundcover, SigNoz, and DeepFlow.

Read the blog post

Using ClickHouse for Financial Charts

SCR-20240521-odub.jpeg

After giving a brief crash course into when (and when not) to use ClickHouse, Adis Nezirović demonstrates how to ingest, query, and visualize financial time-series data. Along the way, he shows how to use the Null table engine to massage data and aggregate states to reduce the amount of data kept around. To conclude, Adis creates a candlestick chart using the Grafana QueryBuilder.

Read the blog post

Post of the Month

Our favorite post this month was by ludwig who was impressed by both the speed of ClickHouse queries and the quality of its data compression.

tweet-1788330168999624920 (1).png

After giving a brief crash course into when (and when not) to use ClickHouse, Adis Nezirović demonstrates how to ingest, query, and visualize financial time-series data. Along the way, he shows how to use the Null table engine to massage data and aggregate states to reduce the amount of data kept around. To conclude, Adis creates a candlestick chart using the Grafana QueryBuilder.

View the post

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
Twitter imageSlack imageGitHub image
Telegram imageMeetup imageRss image