Another month goes by, which means itβs time for another release!
ClickHouse version 25.6 contains 27 new features πΊ 26 performance optimizations π¦ 98 bug fixes π
This release brings
New Contributors
A special welcome to all the new contributors in 25.7! The growth of ClickHouse's community is humbling, and we are always grateful for the contributions that have made ClickHouse so popular.
Below are the names of the new contributors:
Alon Tal, Andrey Volkov, Damian Maslanka, Diskein, Dominic Tran, Fgrtue, H0uston, HumanUser, Ilya fanyShu, Joshie, Mishmish Dev, Mithun P, Oleg Doronin, Paul Lamb, Rafael Roquetto, Ronald Wind, Shiv, Shivji Kumar Jha, Surya Kant Ranjan, Ville Ojamo, Vlad Buyval, Xander Garbett, Yanghong Zhong, ddavid, e-mhui, f2quantum, jemmix, kirillgarbar, lan, wh201906, xander, yahoNanJing, yangjiang, yangzhong, ζη»΄
Hint: if youβre curious how we generate this listβ¦ here.
You can also view the slides from the presentation.
Lightweight updates
Contributed by Anton Popov
ClickHouse now supports standard SQL UPDATE statements at scale, powered by a new lightweight patch-part mechanism. Unlike classic mutations, which rewrite full columns, these updates write only tiny βpatch partsβ that slide in instantly with minimal impact on query performance.
How it works
Patch parts extend the same principles behind purpose-built engines like ReplacingMergeTree, but in a fully general way, exposed through standard SQL:
UPDATE orders
SET discount = 0.2
WHERE quantity >= 40;
- Inserts are fast.
- Merges are continuous.
- Parts are immutable and sorted.
So instead of locating and modifying rows, ClickHouse simply inserts a compact patch part named for what it does: it patches data parts during merges, applying only the changed data:

Merges were already running in the background, so we made them do more, with near-zero overhead. They now apply patch parts, updating the base data efficiently as parts are merged.
Updates show up right away, not-yet-merged patch parts are matched and applied independently for each data range in each data stream in a surgical, targeted way, ensuring that updates are applied correctly without disrupting parallelism:

This makes declarative updates up to 1,000Γ faster than before in our benchmarks, with minimal impact on queries before merges.
Whether youβre updating one row or a million, itβs now fast, efficient, and fully declarative.
DELETE are lightweight featherweight
For DELETES with standard SQL syntax like:
DELETE FROM orders WHERE order_id = 1001 AND item_id = 'mouse';
ClickHouse simply creates a patch part that sets _row_exists = 0
for the deleted rows. The row is then dropped during the next background merge:

Under the hood of fast UPDATEs
Want to go deeper? Check out our 3-part blog series on fast UPDATEs in ClickHouse:
-
Part 1: Purpose-built engines
Learn how ClickHouse sidesteps slow row-level updates using insert-based engines like ReplacingMergeTree, CollapsingMergeTree, and CoalescingMergeTree. -
Part 2: Declarative SQL-style UPDATEs
Explore how we brought standard UPDATE syntax to ClickHouse with minimal overhead using patch parts. -
Part 3: Benchmarks
See how fast it really is. We benchmarked every approach, including declarative UPDATEs, and got up to 1,000Γ speedups.
AI-powered SQL generation
Contributed by Kaushik Iska
The ClickHouse client and clickhouse-local now support AI-powered SQL generation. You can enable this by using the ??
prefix, and if you have OPENAPI_API_KEY
or ANTHROPIC_API_KEY
on your path, it will ask if you want to use it.
For example, letβs connect to the ClickHouse SQL Playground:
1./clickhouse client -mn 2--host sql-clickhouse.clickhouse.com 3--secure 4--user demo --password ''
And then weβll ask for the most popular repositories in July 2025:
1?? what was the most popular github repository in July 2025?;
It will then confirm that weβd like to use our API key:
AI SQL generation will use openai API key from environment variable.
Do you want to continue? [y/N] y
And then off it goes, printing out its thoughts along the way:
β’ Starting AI SQL generation with schema discovery...
βββββββββββββββββββββββββββββββββββββββββββββββββ
π§ thinking..[INFO] Text generation successful - model: gpt-4o, response_id: chatcmpl-BxBDkKy9sNi1Bw8E26UC8RXNBMrPT
π§ Calling: list_databases [call_IXv...]
β list_databases completed
ββ Found 37 databases: - amazon - bluesky - country - covid - default - dns - en...
π§ Calling: list_databases [call_IXv...]
π§ thinking...β list_databases completed
ββ Found 37 databases: - amazon - bluesky - country - covid - default - dns - en...
π§ thinking..[INFO] Text generation successful - model: gpt-4o, response_id: chatcmpl-BxBDmOwzsHgBNHtOiT8tEIhjJoMXV
π§ Calling: list_tables_in_database [call_pRq...]
ββ Args: {"database":"github"}
π§ thinking...β list_tables_in_database completed
ββ Found 19 tables in database 'github': - actors_per_repo - actors_per_repo_mv ...
π§ Calling: list_tables_in_database [call_pRq...]
ββ Args: {"database":"github"}
β list_tables_in_database completed
ββ Found 19 tables in database 'github': - actors_per_repo - actors_per_repo_mv ...
π§ thinking...[INFO] Text generation successful - model: gpt-4o, response_id: chatcmpl-BxBDo0qbEkfvzTl6vgrA49fgjVVfo
π§ Calling: get_schema_for_table [call_NKK...]
ββ Args: {"database":"github","table":"repo_events_per_day"}
π§ thinkingβ get_schema_for_table completed
ββ Schema for github.repo_events_per_day: CREATE TABLE github.repo_events_per_da...
π§ Calling: get_schema_for_table [call_NKK...]
ββ Args: {"database":"github","table":"repo_events_per_day"}
β get_schema_for_table completed
ββ Schema for github.repo_events_per_day: CREATE TABLE github.repo_events_per_da...
π§ thinking.[INFO] Text generation successful - model: gpt-4o, response_id: chatcmpl-BxBDpMp0HpRmFZK2VBZG5y6sWMCsF
βββββββββββββββββββββββββββββββββββββββββββββββββ
β’ β¨ SQL query generated successfully!
Below is the query it came up with:
1SELECT
2 repo_name,
3 sum(count) AS total_events
4FROM github.repo_events_per_day
5WHERE (created_at >= '2025-07-01') AND (created_at <= '2025-07-31')
6GROUP BY repo_name
7ORDER BY total_events DESC
8LIMIT 1;
ββrepo_nameββββββββββββββββββββ¬βtotal_eventsββ
β freefastconnect/fastconnect β 333215 β
βββββββββββββββββββββββββββββββ΄βββββββββββββββ
I would have also filtered it by event type to only include WatchEvent
, but itβs not a bad start. Give it a try and let us know how you get on!
Speed-up for simple aggregations
Contributed by Amos Bird
This release brings a targeted optimization for count() aggregations that reduces memory and CPU usage, making these queries even faster.
Parallel aggregation in ClickHouse
ClickHouse already runs these queries in a highly parallel way, using all available CPU cores, splitting the work across processing lanes, and pushing the hardware close to its limits.
Hereβs how it normally works:

β Data streaming
Data is streamed into the engine block by block.
β‘ Parallel processing
Each CPU core processes a disjoint data range: filtering, aggregating, and sorting rows independently.
β’ Partial results merge into the β£ final result
Each lane produces partial aggregation states (e.g., a sum and a count for avg()).
These intermediate states are merged into a single final result
Letβs now focus on just one stage of this pipeline: aggregation. This is where each CPU core accumulates partial results, like sums and counts, for its slice of the data, which are later merged into the final answer.
Why partial states are needed
Partial aggregation states are what make this highly parallel processing model possible, they allow each CPU core to work independently while still contributing to a correct final result.
To understand why partial states are used, consider this query calculating average property prices per town:
SELECT town, avg(price) AS avg_price FROM uk_price_paid GROUP BY town;
Suppose we want the average price in London using two CPU cores:

β Parallel processing and β‘ partial aggregation states
Lane 1 sees two London rows β sum = 500,000, count = 2
Lane 2 sees one London row β sum = 400,000, count = 1
β’ Merge into the β£ final state
Final avg = (500,000 + 400,000) / (2 + 1) = 300,000
You must merge sums and counts, not intermediate averages, to avoid errors like:
(250,000 + 400,000) / 2 = 325,000 β incorrect
How GROUP BY aggregation works under the hood
Now letβs zoom in even further, into the internals of the aggregation stage. Weβll see how each processing lane maintains its own hash table and builds aggregation states independently.
GROUP BY queries are processed independently by each lane using a hash aggregation algorithm: each lane maintains a hash table where each key (e.g. town) points to an aggregation state.
For our example, when a processing lane sees LONDON in the input:
-
If the key doesnβt exist in the β hash table, a new entry is created and an aggregation state is allocated in a β‘ global memory arena. A pointer to that state is stored in the hash table.
-
If it does exist, the pointer is used to look up and update the existing aggregation state in the arena.

Optimization for count()
Now, hereβs where this release improves things.
For this query:
SELECT town, count() AS count FROM uk_price_paid GROUP BY town;
β¦ClickHouse skips the memory arena entirely.
Since count() is an additive function, it doesnβt require a complex aggregation state to correctly merge partial results across lanes. Count() only needs to store a single 64-bit integer (the count itself), which is no larger than a pointer on 64-bit systems, ClickHouse can:
- Store the count directly in the hash table cell
- Skip allocating space in the memory arena
- Skip pointer indirection

This avoids memory allocation and pointer chasing altogether, reducing CPU and memory overhead.
In our tests, count() aggregations are now 20β30% faster than before, with lower memory usage and fewer CPU cycles.
Why this matters
Count aggregations are everywhere: from dashboards ranking the most active users or most downloaded packages, to alerting systems tracking unusual event frequencies. Whether itβs analytics, observability, or search, nearly every system relies on fast, efficient count queries. Thatβs why even small improvements here can have an outsized impact.
Real-world results
Weβll demonstrate this optimization using a typical analytical query over web analytics data that returns the 10 most active users by using a count() aggregation:
SELECT UserID, count() FROM hits GROUP BY UserID ORDER BY count() DESC LIMIT 10;
(You can run the query yourself by creating the table and loading the data)
We used an AWS m6i.8xlarge EC2 instance (32β―cores, 128β―GBβ―RAM) with a gp3 EBS volume (16kβ―IOPS, 1000β―MiB/s max throughput) to run the query.
We ran the query 3 times on ClickHouse 25.6, and show the runtime statistics returned by clickhouse-client:
10 rows in set. Elapsed: 0.447 sec. Processed 100.00 million rows, 799.98 MB (223.49 million rows/s., 1.79 GB/s.) Peak memory usage: 2.50 GiB. 10 rows in set. Elapsed: 0.391 sec. Processed 100.00 million rows, 799.98 MB (255.79 million rows/s., 2.05 GB/s.) Peak memory usage: 2.52 GiB. 10 rows in set. Elapsed: 0.383 sec. Processed 100.00 million rows, 799.98 MB (261.21 million rows/s., 2.09 GB/s.) Peak memory usage: 2.51 GiB.
We then ran the same query 3 times on ClickHouse 25.7:
10 rows in set. Elapsed: 0.305 sec. Processed 100.00 million rows, 799.98 MB (327.54 million rows/s., 2.62 GB/s.) Peak memory usage: 2.00 GiB. 10 rows in set. Elapsed: 0.265 sec. Processed 100.00 million rows, 799.98 MB (377.43 million rows/s., 3.02 GB/s.) Peak memory usage: 1.96 GiB. 10 rows in set. Elapsed: 0.237 sec. Processed 100.00 million rows, 799.98 MB (422.34 million rows/s., 3.38 GB/s.) Peak memory usage: 1.96 GiB.
Let's analyse these runs:
ClickHouse 24.06
Run | Time (s) | Rows/s (M) | GB/s | Memory (GiB) |
---|---|---|---|---|
1 | 0.447 | 223.49 | 1.79 | 2.50 |
2 | 0.391 | 255.79 | 2.05 | 2.52 |
3 | 0.383 | 261.21 | 2.09 | 2.51 |
Averages:
-
Time: 0.407 s
-
Rows/s: 246.83 M
-
GB/s: 1.98 GB/s
-
Memory: 2.51 GiB
ClickHouse 24.07
Run | Time (s) | Rows/s (M) | GB/s | Memory (GiB) |
---|---|---|---|---|
1 | 0.305 | 327.54 | 2.62 | 2.00 |
2 | 0.265 | 377.43 | 3.02 | 1.96 |
3 | 0.237 | 422.34 | 3.38 | 1.96 |
Averages:
-
Time: 0.269 s
-
Rows/s: 375.77 M
-
GB/s: 3.01 GB/s
-
Memory: 1.97 GiB
Improvements from 24.06 β 24.07
Metric | 24.06 Avg | 24.07 Avg | Improvement |
---|---|---|---|
Query time | 0.407 s | 0.269 s | 1.51Γ faster (34% faster) |
Rows/s | 246.83 M | 375.77 M | 1.52Γ higher |
GB/s | 1.98 GB/s | 3.01 GB/s | 1.52Γ higher |
Memory usage | 2.51 GiB | 1.97 GiB | 21.5% less memory |
Optimizations for JOINs
Contributed by Nikita Taranov.
Over the past few months, improving JOIN performance has been a continuous focus for us. The default JOIN strategy, the parallel hash join, has seen steady optimizations:
-
Version 24.7 improved hash table allocation.
-
Version 24.12 added automatic join reordering to intelligently choose the optimal table for the build phase.
-
versions 25.1 and 25.2 delivered further low-level enhancements, speeding up the joinβs probe phase and eliminating thread contention in its build phase, respectively.
These ongoing improvements in parallelism, query planning, and algorithm efficiency have steadily boosted JOIN speeds. ClickHouse 25.7 continues this trend with four additional low-level optimizations for hash joins:
1. Faster single-key joins
Eliminated an internal loop and unnecessary null-checks for joins on a single key column. This reduces CPU instructions and speeds up one-column JOINs.
Each bar in the PRβs test screenshot compares old vs. new JOIN performance, showing ~1.37Γ speedups across the board.

2. Speed-ups for multi-OR-condition JOINs
Extended those single-key optimizations to JOINs with multiple OR conditions in the ON clause. These now benefit from the same lower-level improvements, making JOINs with multiple OR conditions more efficient.
Multi-condition JOINs now run up to 1.5Γ faster. Each bar in the PRβs performance test screenshot shows speedups for queries with OR conditions in the ON clause, thanks to reduced instruction overhead and tighter join loop execution.

3. Lower CPU overhead in join processing
Removed repeated hash computations during join tracking, cutting down on redundant work and boosting JOIN throughput.
Two join queries with large input tables show 1.5Γ to 1.8Γ speedups in the PRβs performance test screenshot after reducing hash recomputation overhead. These improvements target joins with high match cardinality, where redundant hashing previously dominated CPU cycles.


4. Reduced memory usage for join results
Result buffers are now sized precisely after determining the number of matches, avoiding waste and improving performance, especially for wide rows with repeated or padded columns, such as from JOIN ... USING queries or synthetic columns.
The PRβs performance test screenshot shows that JOINs with wide result rows now complete 1.3Γ to 1.4Γ faster due to more precise sizing of result buffers. These queries previously over-allocated memory for joined rows, especially when many columns were repeated or padded. The optimization reduces memory usage and speeds up processing for sparse or wide joins.


Weβre not done yet. JOIN performance remains a top priority, and weβll continue optimizing the parallel hash join and other strategies in future releases.
Coming soon: Weβre working on a technical deep dive that will walk through all of the JOIN improvements from the past few months, including benchmarks and practical examples. Stay tuned!
Native support for Geo Parquet types
Contributed by Konstantin Vedernikov
In ClickHouse 25.5, ClickHouse added support for reading Geo types in Parquet files. In that release, ClickHouse read Parquetβs Geo Types into other ClickHouse types, like tuples or lists.
As of 25.7, ClickHouse reads all Geo types with WKB encoding) into the equivalent ClickHouse Geo type.
Letβs have a look at how it works. The following query writes each of the ClickHouse Geo types into a Parquet file:
1SELECT (10, 10)::Point as point, 2 [(0, 0), (10, 0), (10, 10), (0, 10)]::LineString AS lineString, 3 [ 4 [(0, 0), (10, 0), (10, 10), (0, 10)], 5 [(1, 1), (2, 2)] 6 ]::MultiLineString AS multiLineString, 7 [[(20, 20), (50, 20), (20, 50)], [(30, 30), (50, 50), (50, 30)]]::Polygon AS polygon, 8 [ [[(0, 0), (10, 0), (10, 10), (0, 10)]], 9 [[(20, 20), (50, 20), (50, 50), (20, 50)],[(30, 30), (50, 50), (50, 30)]] 10 ]::MultiPolygon AS multiPolygon, 11 [(0, 0), (10, 0), (10, 10), (0, 10)]::Ring AS ring 12INTO OUTFILE 'geo.parquet' TRUNCATE;
We can then read the Parquet file back:
1DESCRIBE file('geo.parquet');
ββnameβββββββββββββ¬βtypeβββββββββββββββββββββββββ
β point β Point β
β lineString β LineString β
β multiLineString β MultiLineString β
β polygon β Polygon β
β multiPolygon β MultiPolygon β
β ring β Array(Tuple( β΄β
β ββ³ `1` Nullable(Float64), β΄β
β ββ³ `2` Nullable(Float64))) β
βββββββββββββββββββ΄ββββββββββββββββββββββββββββββ
All the types have been read back as their Geo type, except for Ring
. Geo types in Parquet typically use WKB encoding, but this doesn't support the Ring type.
New Geospatial functions
Contributed by Paul Lamb
Two new functions check whether two polygons intersect: polygonIntersectsCartesian
and polygonIntersectsSpherical
. polygonIntersectsCartesian
uses Cartesian (flat plane) geometry to do the calculation, whereas polygonIntersectsSpherical
uses spherical geometry.
Letβs have a look at how to use polygonIntersectsSpherical
with the help of two polygons covering parts of central London:
1select polygonsIntersectSpherical( 2 [[[(-0.140, 51.500), (-0.140, 51.510), (-0.120, 51.510), (-0.120, 51.500), (-0.140, 51.500)]]], 3 [[[(-0.135, 51.505), (-0.135, 51.515), (-0.115, 51.515), (-0.115, 51.505), (-0.135, 51.505)]]] 4);
ββpolygonsInteβ―51.505)]]])ββ
β 1 β
ββββββββββββββββββββββββββββ
And now polygonIntersectsCartesian
with imaginary coordinates of two football/soccer players playing on opposite sides of the pitch.
1select polygonsIntersectCartesian( 2[[[(0.0, 0.0), (0.0, 64.0), (45.0, 64.0), (45.0, 0.0), (0.0, 0.0)]]], 3[[[(55.0, 0.0), (55.0, 64.0), (100.0, 64.0), (100.0, 0.0), (55.0, 0.0)]]] 4);
ββpolygonsInteβ―51.505)]]])ββ
β 1 β
ββββββββββββββββββββββββββββ
Security features
Contributed by Artem Brustovetskii / Diskein
This release had a couple of security features.
First, you can now create users dynamically using parameterized queries, making user provisioning more flexible:
1SET param_username = 'test123';
2CREATE USER {username:Identifier};
This feature allows for programmatic user creation with variable usernames, improving automation capabilities for user management workflows.
And we can now configure READ and WRITE grants for external data sources. Previously, external data access was managed with broad permissions:
1GRANT S3 ON *.* TO user
In 25.7, you can enable the new read/write grants feature in your configuration:
1# config.d/read_write_grants.yaml 2access_control_improvements: 3 enable_read_write_grants: true
And then grant permissions like this:
1GRANT READ, WRITE ON S3 TO user;