Welcome to the November ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
The big news is that Refreshable Materialized Views are production-ready, and we have an official Docker image!
Alexey Milovidov was a guest on Data Talks on the Rocks, we learn how to simplify queries with dictionaries, and there’s a deep dive on the new JSON data type.
Inside this issue
- Featured community member
- Upcoming events
- 24.10 Release
- Alexey Milovidov on Data Talks on the Rocks
- Simplifying queries with ClickHouse dictionaries
- Building a financial data pipeline with Alpha Vantage and ClickHouse
- How we built a new powerful JSON data type for ClickHouse
- ClickHouse Cloud Live Update: November 2024
- Quick reads
- Post of the month
Visit us at AWS re:Invent
Are you heading to re:Invent? We are too, and would love to connect with you!
Book a meeting with us beforehand by emailing [email protected], or stop by our booth #1737 for:
- A chance to meet all three of our founders: Aaron, Alexey, and Yury
- Live demos
- Exclusive swag
- And a chat with ClickHouse experts
Don’t miss out – we’re also hosting a ClickHouse House Party with the Chainsmokers. It’ll be one epic night you won’t want to miss!
Register for the Chainsmokers party
Featured community member
This month's featured community member is Lukas Biewald, co-founder and CEO at Weights & Biases.
Lukas has worked in machine learning for 20 years, previously co-founding Figure Eight with Chris Van Pelt, where they specialized in data labeling for machine learning applications. Appen acquired the company in March 2019.
In 2018, Lukas co-founded Weights & Biases, an MLOps platform designed to assist machine learning practitioners in tracking experiments, managing datasets, and collaborating on model development.
Lukas presented at the ClickHouse San Francisco meetup in September, where he shared his experience building AI applications and how they use ClickHouse as part of their Weave application. This was also written up in a blog published last week.
Upcoming events
Global events
- Release call 24.11 - Nov 28
Free training
- Migrating from Postgres to ClickHouse Workshop - Virtual - Nov 27
- ClickHouse Fundamentals - Virtual- Dec 4
- In-Person ClickHouse Training in Sweden - Sweden- Dec 9
- In-Person ClickHouse Training in Denmark - Denmark - Dec 9
- ClickHouse Developer In-Person Training in New York - Manhattan, NY - Dec 11-12
- ClickHouse Developer Training - Virtual - Dec 18-19
Events in AMER
- Microsoft Ignite - Chicago - Nov 19-22
- AWS re:Invent 2024 - Las Vegas - Dec 2-6
- Meetup in New York - Dec 9
- Meetup in San Francisco - Dec 12
Events in EMEA
- Meetup in Dubai - Nov 21
- Meetup in Paris - Nov 26
- Meetup in Amsterdam - Dec 3
- Meetup in Stockholm - Dec 9
24.10 release
Refreshable Materialized Views are production-ready! That’s the big news in the 24.10 release, but we’ve also simplified table cloning with the CLONE AS clause, and there’s remote file caching, which is super helpful when querying S3 buckets.
Alexey Milovidov on Data Talks on the Rocks
Data Talks on the Rocks is a series of interviews with thought leaders and founders discussing the latest trends in data and analytics. Michael Driscoll, CEO and Co-founder of Rill Data, hosts it.
In episode 4, his guest was none other than Alexey Milovidov, the CTO and Co-founder of ClickHouse. In a wide-ranging conversation, they discussed the importance of hashing functions in database design, how AI might impact database technologies in the future, the development of ClickHouse's new analyzer, and more.
Simplifying queries with ClickHouse dictionaries
Jeffrey Needles, founder of Aggregations.io, has written a blog post explaining how to simplify queries using dictionaries.
Jeffrey takes us through why you’d want to use a dictionary, where data is sourced from, and how to choose the right type of key before demonstrating the performance gain from using them in a query.
Building a financial data pipeline with Alpha Vantage and ClickHouse
Craig Dickson builds a high-performance data pipeline using Alpha Vantage for data acquisition and ClickHouse for data storage and analytics.
After querying the Alpha Vantage API data, Craig cleans it up in Pandas before ingesting it into ClickHouse Cloud. He then shows how to create various data visualizations using Vega-Altair.
How we built a new powerful JSON data type for ClickHouse
The new JSON data type was introduced in version 24.8 in August, and we showed some examples in the release post but didn’t explore it in depth.
That all changes with this blog post, where Tom Schreiber and Pavel Kruglov explain how it works under the hood. They explain how the new data type overcomes challenges like having values of multiple data types in the same JSON path, how to avoid pushing work to query time, and how to prevent an avalanche of column data files on disk.
There are lots of diagrams explaining how it all works. One to read for ClickHouse enthusiasts!
ClickHouse Cloud Live Update: November 2024
Krithika Balagurunathan and Zach Naimon joined us on our latest ClickHouse Cloud live update call. They taught us about Bring Your Own Cloud and Compute-compute separation, respectively.
After giving an overview of the features and a brief demo, Zach and Krithika hosted a detailed Q&A, which included the following questions:
Does BYOC meet FedRAMP requirements? Can horizontal autoscaling be automated based on resource consumption? How do you migrate an existing cluster to BYOC? Can you have powerful instances for read/write nodes and less powerful instances for read-only nodes?
Check out the full recording below to hear the answers to these questions and more!
Quick reads
- It's not really a quick read, but ClickHouse now has an official Docker image!
- Carl Lindesvärd posted a Twitter thread about the things he’s learned from six months of working with ClickHouse.
- Ravindra Elicherla stores tick-by-tick Webscocket data in ClickHouse.
- I came across Trench, an event-tracking system built on Apache Kafka and ClickHouse. It powers Frigade's real-time event-tracking pipeline, handles large event volumes, and provides real-time analytics
- The MetricFire team explains how to monitor ClickHouse with Telegraf and MetricFire.
- Jesse Grodman, Software Engineer at Triple Whale, shares how to move data from an unsharded ClickHouse cluster to a sharded one without adding load to the existing cluster.
Post of the month
Our favorite post this month was by Steven Tey about ClickHouse’s arrayIntersect function.