ClickHouse Release 25.4

neutral avatar 400804ae96
May 9, 2025 Β· 11 minutes read

Another month goes by, which means it’s time for another release!

ClickHouse version 25.4 contains 25 new features 🌸 23 performance optimizations πŸ¦‹ 58 bug fixes 🐝

This release brings lazy materialization, Apache Iceberg time travel, correlated subqueries for the EXISTS clause, and more!

New Contributors

A special welcome to all the new contributors in 25.4! The growth of ClickHouse's community is humbling, and we are always grateful for the contributions that have made ClickHouse so popular.

Below are the names of the new contributors:

Amol Saini, Drew Davis, Elmi Ahmadov, Fellipe Fernandes, Grigory Korolev, Jia Xu, John Doe, Luke Gannon, Muzammil Abdul Rehman, Nikolai Ryzhov, ParvezAhamad Kazi, Pavel Shutsin, Saif Ullah, Samay Sharma, Shahbaz Aamir, Sumit, Todd Yocum, Vladimir Baikov, Wudidapaopao, Xiaozhe Yu, arf42, cjw, felipeagfranceschini, krzaq, wujianchao5, zouyunhe

Hint: if you’re curious how we generate this list… here.


You can also view the slides from the presentation.

Lazy materialization

Contributed by Xiaozhe Yu

Lazy materialization is a new query optimization in ClickHouse that defers reading column data until it’s actually needed. In Top N queries with sorting and LIMIT, this means ClickHouse can often skip loading most of the data, cutting down I/O, memory usage, and runtime by orders of magnitude.

Here’s a real-world example: this query finds the Amazon reviews with the highest number of helpful votes, returning the top 3 along with their title, headline, and full text.

We first run it with lazy materialization disabled (and cold filesystem cache):

SELECT helpful_votes, product_title, review_headline, review_body
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3
FORMAT Null
SETTINGS query_plan_optimize_lazy_materialization = false;
0 rows in set. Elapsed: 219.071 sec. Processed 150.96 million rows, 71.38 GB (689.08 thousand rows/s., 325.81 MB/s.)
Peak memory usage: 1.11 GiB.

Then we rerun the exact same query, but this time with lazy materialization enabled (after clearing the filesystem cache again):

SELECT helpful_votes, product_title, review_headline, review_body
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3
FORMAT Null
SETTINGS query_plan_optimize_lazy_materialization = true;
0 rows in set. Elapsed: 0.139 sec. Processed 150.96 million rows, 1.81 GB (1.09 billion rows/s., 13.06 GB/s.)
Peak memory usage: 3.80 MiB.

Boom: a 1,576Γ— speedup! With 40Γ— less I/O and 300Γ— lower memory.

Same query. Same table. Same machine.
All we changed? When ClickHouse reads the data.

Tom Schreiber recently wrote a blog post, ClickHouse gets lazier (and faster): Introducing lazy materialization, in which he explains this feature in detail.

Data Lakes from MergeTree tables

Contributed by Alexey Milovidov

MergeTree tables on read-only disks can now refresh their state and load new data parts, if they appear in the background.

This lets you run an unlimited number of readers on top of externally hosted, continuously updating datasets, which is great for data sharing and publishing.

We can create at most one writer:

CREATE TABLE writer (...) ORDER BY ()
SETTINGS 
  table_disk = true,
  disk = disk(
      type = object_storage,
      object_storage_type = s3,
      endpoint = 'https://mybucket.s3.us-east-1.amazonaws.com/data/',
      metadata_type = plain_rewritable);

And an unlimited number of readers in any locations:

CREATE TABLE reader (...) ORDER BY ()
SETTINGS 
  table_disk = true, 
  refresh_parts_interval = 1,
  disk = disk(
      readonly = true,
      type = object_storage,
      object_storage_type = s3,
      endpoint = 'https://mybucket.s3.us-east-1.amazonaws.com/data/',
      metadata_type = plain_rewritable);

Let's have a look at one we created earlier, so to speak. The following table dataset contains over 40 million posts from Hacker News:

CREATE TABLE hackernews_history UUID '66491946-56e3-4790-a112-d2dc3963e68a'
(
    `update_time` DateTime DEFAULT now(),
    `id` UInt32,
    `deleted` UInt8,
    `type` Enum8(
        'story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5
    ),
    `by` LowCardinality(String),
    `time` DateTime,
    `text` String,
    `dead` UInt8,
    `parent` UInt32,
    `poll` UInt32,
    `kids` Array(UInt32),
    `url` String,
    `score` Int32,
    `title` String,
    `parts` Array(UInt32),
    `descendants` Int32
)
ENGINE = ReplacingMergeTree(update_time)
ORDER BY id
SETTINGS 
  refresh_parts_interval = 60, 
  disk = disk(
    readonly = true, 
    type = 's3_plain_rewritable', 
    endpoint = 'https://clicklake-test-2.s3.eu-central-1.amazonaws.com/', 
    use_environment_credentials = false
  );

We can write a query against it just like any other table:

SELECT type, count()
FROM hackernews_history
GROUP BY ALL
ORDER BY count() DESC;
β”Œβ”€type────┬──count()─┐
β”‚ comment β”‚ 38549467 β”‚
β”‚ story   β”‚  5777529 β”‚
β”‚ job     β”‚    17677 β”‚
β”‚ pollopt β”‚    15261 β”‚
β”‚ poll    β”‚     2247 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CPU workload scheduler

Contributed by Sergei Trifonov

This release adds CPU slot scheduling for workloads, which lets you limit the number of concurrent threads for a specific workload.

This feature makes it possible to share ClickHouse clusters between different workloads and provide weighted fair allocation and priority-based allocation for CPU resources. This lets you, for example, run heavy ad-hoc queries without affecting high-priority real time reporting.

Let's have a look at how to configure it. We first need to define a CPU resource:

CREATE RESOURCE cpu (MASTER THREAD, WORKER THREAD);

Once we define a CPU resource, the setting max_concurrent_threads is enabled for controlling CPU allocation. Without a CPU resource declaration, ClickHouse will use the server-level concurrency control settings (concurrent_threads_soft_limit_num and related settings) instead.

A quick explainer on the thread types from the docs:

  • Master thread β€” the first thread that starts working on a query or background activity like a merge or a mutation.
  • Worker thread β€” the additional threads that master can spawn to work on CPU-intensive tasks.

To achieve better responsiveness, we might choose to use separate resources for master and worker threads:

CREATE RESOURCE worker_cpu (WORKER THREAD);
CREATE RESOURCE master_cpu (MASTER THREAD);

We can list the resources on our ClickHouse service by running the following query:

SELECT *
FROM system.resources
FORMAT Vertical;

We can then create workloads that use those resources.

CREATE WORKLOAD all;

CREATE WORKLOAD admin IN all 
SETTINGS max_concurrent_threads = 10;

CREATE WORKLOAD production IN all 
SETTINGS max_concurrent_threads = 100;

CREATE WORKLOAD analytics IN production
SETTINGS max_concurrent_threads = 60, weight = 9;

CREATE WORKLOAD ingestion IN production;

We can only have one top level workload i.e. one that doesn't include the IN <workload> clause.

We can list the workloads on our ClickHouse service by running the following query:

SELECT *
FROM system.workloads
FORMAT Vertical;

We can then set the appropriate workload when querying:

SET workload = 'analytics';

Correlated subqueries for EXISTS

Contributed by Dmitry Novik

Our next feature is a fun one - the EXISTS clause now supports correlated subqueries! Let’s see how this works with help from the UK property dataset.

Below is the schema for this dataset:

CREATE TABLE uk.uk_price_paid
(
    price UInt32,
    date Date,
    postcode1 LowCardinality(String),
    postcode2 LowCardinality(String),
    type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
    is_new UInt8,
    duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
    addr1 String,
    addr2 String,
    street LowCardinality(String),
    locality LowCardinality(String),
    town LowCardinality(String),
    district LowCardinality(String),
    county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2);

Let’s say we want to find districts/towns with the highest average property prices in 2009 where at least five properties were sold, but with one caveat: they must have sold at least one detached property for over Β£1 million in 2006!

We can now work this out with the following query:

SELECT district, town,
       round(AVG(price), 2) AS avgPrice,
       COUNT(*) AS totalSales
FROM uk.uk_price_paid p1
WHERE date BETWEEN '2009-01-01' AND '2009-12-31'
AND EXISTS (
  SELECT 1
  FROM uk.uk_price_paid p2
  WHERE p2.district = p1.district
  AND p2.town = p1.town
  AND p2.type = 'detached'
  AND p2.price > 1000000
  AND p2.date BETWEEN '2006-01-01' AND '2006-12-31'
)
GROUP BY ALL
HAVING totalSales > 5
ORDER BY avgPrice DESC
LIMIT 10
SETTINGS allow_experimental_correlated_subqueries = 1;
β”Œβ”€district───────────────┬─town─────────────┬───avgPrice─┬─totalSales─┐
β”‚ ELMBRIDGE              β”‚ LEATHERHEAD      β”‚  1118756.9 β”‚         58 β”‚
β”‚ KENSINGTON AND CHELSEA β”‚ LONDON           β”‚ 1060251.76 β”‚       1620 β”‚
β”‚ WOKING                 β”‚ GUILDFORD        β”‚     901000 β”‚          9 β”‚
β”‚ CHILTERN               β”‚ TRING            β”‚  893333.33 β”‚          6 β”‚
β”‚ ENFIELD                β”‚ BARNET           β”‚  891921.88 β”‚         48 β”‚
β”‚ ELMBRIDGE              β”‚ COBHAM           β”‚   875841.6 β”‚        202 β”‚
β”‚ WYCOMBE                β”‚ HENLEY-ON-THAMES β”‚     846300 β”‚         10 β”‚
β”‚ GUILDFORD              β”‚ GODALMING        β”‚  831977.67 β”‚          9 β”‚
β”‚ RUNNYMEDE              β”‚ VIRGINIA WATER   β”‚  802773.53 β”‚         85 β”‚
β”‚ THREE RIVERS           β”‚ NORTHWOOD        β”‚  754197.22 β”‚         36 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Notice that we must enable allow_experimental_correlated_subqueries as this is an experimental feature.

Lines 10 and 11 reference fields from the outer query (p1) within the subquery (p2). The condition p2.district = p1.district AND p2.town = p1.town creates a dynamic relationship between the two query levels, evaluating the subquery separately for each district/town combination.

Persistent databases in clickhouse-local

Contributed by Alexey Milovidov

The default database is now persistent when using clickhouse-local.

To see the difference that this makes, let's launch clickhouse-local in 25.3:

clickhouse -m --path data

We’ll create a table:

CREATE TABLE foo (a UInt8) ORDER BY a;

If we exit the CLI by typing exit; and relaunch it, the following query will return no rows:

SHOW TABLES
Ok.

0 rows in set. Elapsed: 0.008 sec.

Now let’s do the same with ClickHouse 25.4:

clickhouse -m --path data2

We’ll create a table:

CREATE TABLE foo (a UInt8) ORDER BY a;

And then if we exit before relaunching, we’ll see the following:

SHOW TABLES
β”Œβ”€name─┐
1. β”‚ foo  β”‚
   β””β”€β”€β”€β”€β”€β”€β”˜

1 row in set. Elapsed: 0.006 sec.

Apache Iceberg time travel

Contributed by Brett Hoerner, Dan Ivanik

Over the last few releases, we’ve been adding more support to ClickHouse for open table formats like Apache Iceberg/Delta Lake and catalogs like Unity/AWS Glue, and this release is no exception.

It’s now possible to run Apache Iceberg queries based on previous snapshots, aka time travel. We’ve also recorded a video showing how to use this functionality with the AWS Glue catalog.

Below is an example showing the query syntax for this functionality:

CREATE DATABASE test
ENGINE = DataLakeCatalog
SETTINGS 
  catalog_type = 'glue', 
  region = '', 
  aws_access_key_id = '', 
  aws_secret_access_key = '';
SELECT count()
FROM test.`iceberg_benchmark.time_travel`
SETTINGS iceberg_timestamp_ms = 1742497721135;

You can also see the AWS Glue Catalog developer guide for more examples.

Default compression codec for tables

Contributed by Gvoelfin

It’s now possible to set a default compression codec for every column in MergeTree tables. For example:

CREATE TABLE t (
    time DateTime CODEC(ZSTD(3)), -- codec on a column level
    user_id UInt64, -- uses the default codec
    ...
) ORDER BY time
SETTINGS default_compression_codec = 'ZSTD(1)' -- codec on a table level

As a reminder, ClickHouse applies LZ4 compression in the self-managed version and ZSTD in ClickHouse Cloud by default.

As well as setting the default codec at a table level, we can also set it globally for all tables via a config file:

config.d/compression.yaml

compression:
    case:
        min_part_size: 1000000000 # Optional condition
        method: 'zstd'

SSH Interface

George Gamezardashvili, Nikita Mikhailov

ClickHouse 25.3 saw the ClickHouse Server add support for the SSH protocol, which means any SSH client can connect to it directly

We've now added support for this to play.clickhouse.com. You can connect to that service by running the following:

ssh play@play.clickhouse.com

There's no password, so you can just press enter when prompted for one.

There are a range of datasets to play with and below is an example of a query against a table containing stock prices:

SELECT symbol, max(price), sum(volume)
FROM stock
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10;
β”Œβ”€symbol─┬─max(price)─┬─sum(volume)─┐
 1. β”‚ RBAK   β”‚    9963.24 β”‚ 11569148200 β”‚
 2. β”‚ SEB    β”‚    1670.01 β”‚     2382900 β”‚
 3. β”‚ WPO    β”‚     996.74 β”‚    30127600 β”‚
 4. β”‚ NVR    β”‚        938 β”‚   178289600 β”‚
 5. β”‚ GIW    β”‚        767 β”‚     1306400 β”‚
 6. β”‚ WTM    β”‚      702.5 β”‚    21839600 β”‚
 7. β”‚ INFY   β”‚    670.062 β”‚   670302700 β”‚
 8. β”‚ QCOM   β”‚        659 β”‚ 28698244000 β”‚
 9. β”‚ MCHXP  β”‚        585 β”‚       58200 β”‚
10. β”‚ MTB    β”‚     575.25 β”‚   507431900 β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Get started with ClickHouse Cloud today and receive $300 in credits. At the end of your 30-day trial, continue with a pay-as-you-go plan, or contact us to learn more about our volume-based discounts. Visit our pricing page for details.


Share this post

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

Recent posts

Tsvetan Stoychev Β· Jun 26, 2026