It’s been an exciting month at ClickHouse. We’ve released Lightweight Deletes, one of the most highly anticipated ClickHouse features of 2022. And we’re getting close to publicly launching ClickHouse Cloud (sign up for early access here).
Read on about snazzy features in our ClickHouse 22.8 LTS (Long Term Support) release, a simple example of the new DELETE query, and a roundup of ClickHouse stories for the last month.
By the way, if you’re reading this on our website, did you know you can receive every monthly newsletter as an email in your inbox as well? Sign up here.
Mark your calendar:
ClickHouse v22.9 Release Webinar
- When? Thursday, September 22 @ 9 am PST / 6 pm CEST
- How do I join? Register here.
Silicon Valley ClickHouse Meetup
- What? Come hang out with other users and hear what PostHog, Grafana and Barracuda are doing with ClickHouse.
- Where? San Jose, CA
- When? Wednesday, September 28 @ 6 pm PST
- How do I join? Register here.
- What? A number of the ClickHouse team are going to be at re:Invent! Interested in meeting up with us, maybe grabbing a beverage, and talking about ClickHouse? Let us know!
- Where? Las Vegas, NV
- When? November 29 - December 3, 2022
ClickHouse v22.8 LTS
Our new Long Term Support release is out with many new features:
- DELETE query It’s finally here! Lightweight deletes were one of the most requested features in our 2022 roadmap and we delivered it with many months to spare. Whenever you are currently using
ALTER TABLE … DELETEyou should switch to
DELETE FROM … WHEREin almost all cases. It is much cheaper to execute, though still asynchronous (set
mutations_sync = 2to wait for the query to complete).
- Extended date ranges
DateTime64now support dates from 1900 to 2299 (1925 to 2283 before).
- Parallel distributed insert from S3 ClickHouse can already insert data very quickly on a single machine (typically millions of rows per second). But if you have a cluster of machines, you can now insert from S3 in parallel on all of them. Maybe even billions of rows a second are possible? Somebody should try it.
- JSON logging ClickHouse can now output its logs in JSON format. This should make it easier to ingest into and query in log management software. You can also ingest into ClickHouse, of course.
- [Infer dates and numbers](https://github.com/ClickHouse/ClickHouse/pull/39186 When using schema inference, you can now tell ClickHouse to try to infer dates and numbers from strings.
- Query parameters can now be set in interactive mode. For example, to define a parameter named
SET param_user = alexey.
- More Pretty formats Here are 7 more, bringing the total to almost 70. I personally like
PrettyMonoBlock, but all formats are really “pretty”.
Take a look at the release webinar slides, the recording and please upgrade - it is a Long Term Support release.
Query of the Month: Deleting data can make you feel better
For this month, let’s keep it simple. And tongue in cheek, just a little bit.
If there is one new feature in ClickHouse 22.8 that you absolutely must try it’s the new DELETE query. Up to now, the only way to delete specific rows in ClickHouse was to use an
ALTER TABLE table DELETE WHERE cond statement. It would asynchronously rewrite all data files containing rows matching the condition. Since data files in ClickHouse can by default be up to 150 GB this was very expensive and could lead to significant CPU and memory usage.
The new DELETE query takes a different approach. Instead of physically deleting all data immediately it only marks the specified rows as deleted using a hidden column. The data is still there, but it is transparently filtered out of queries. Effectively, all queries are executed with an additional condition
WHERE _deleted = false. Later, as ClickHouse merges files in the background it will drop any rows marked as deleted during the merge process.
Let’s see how useful the new DELETE query can be.
Just for fun, let’s create a table with a portfolio of stablecoins (cost is cost of purchase per coin on January 1, 2022, price is the value on 26 August 2022):
CREATE TABLE mymoney engine = MergeTree ORDER BY coin AS SELECT 'USDC' AS coin, 1000 AS amount, 1.0002 AS cost, 0.9999 AS price UNION ALL SELECT 'USDT', 1000, 1.0001, 1.00 UNION ALL SELECT 'USTC', 1000, 1.00, 0.0275
And let’s calculate our portfolio return:
SELECT sum(amount * cost) cost, sum(amount * price) value, value - cost gain_loss, 1 - value / cost pct FROM mymoney
We lost 32% of our money! What happened? Turns out TerraUSD (USTC) was not such a “stable” stablecoin after all.
But no matter, the DELETE query will take care of this:
SET allow_experimental_lightweight_delete = 1 DELETE FROM mymoney WHERE coin = 'USTC'
And now our portfolio return looks much better. If only it was that simple…
What we’ve been reading:
Cloudflare Blog: Log analytics using ClickHouse Besides using ClickHouse for serving real-time HTTP and DNS analytics (link, link) Cloudflare is also using ClickHouse for storing internal logs. By moving from Elasticsearch to ClickHouse, they were able to remove sampling, and provide fast querying while saving costs.
ClickHouse Plugin for Grafana - 2.0 Release Version 2.0 of our popular ClickHouse plugin for Grafana, now supports HTTP connections and the JSON data type, among other changes. Check it out!
Exploring massive, real-world data sets: 100+ Years of Weather Records in ClickHouse What loading more than a century of weather data into ClickHouse looks like and how you can use it.
Getting Data Into ClickHouse - Part 1 and Part 2 A walkthrough of getting Hacker News data into ClickHouse and running some fun queries on it.
MySQL CDC to Clickhouse using Decodable's Change Stream Capabilities Decodable shows in this video how to synchronize data from MySQL to ClickHouse.
New ClickHouse Adopters: A fun addition this month, welcome Nintendo emulator Dolphin. Also welcome to open source ad blocker AdGuard and non-profit project OONI. Get yourself added as well!
Thanks for reading. We’ll see you next month!
The ClickHouse Team