Skip to main content
Skip to main content

Lazy materialization

This article describes how lazy materialization works and how it fits into ClickHouse’s broader I/O optimization stack. It presents a real-world example demonstrating how lazy materialization increases query performance.

Available from version 25.4

Lazy materialization was introduced in version 25.4 of ClickHouse, and is turned on by default.

Overview

Over the years, ClickHouse has introduced a series of layered optimizations to aggressively reduce I/O. These techniques form the foundation of its speed and efficiency:

OptimizationDescription
Columnar storageAllows skipping entire columns that aren't needed for a query and also enables high compression by grouping similar values together, minimizing I/O during data loading.
Sparse primary indexes | secondary data-skipping indexes | projectionsPrune irrelevant data by identifying which granules (row blocks) might match filters on indexed columns. These techniques operate at the granule level and can be used individually or in combination.
PREWHEREChecks matches also for filters on non-indexed columns to skip data early that would otherwise be loaded and discarded. It can work independently or refine the granules selected by indexes, complementing granule pruning by skipping rows that don't match all column filters.
Query condition cacheSpeeds up repeated queries by remembering which granules matched all filters last time. ClickHouse can then skip reading and filtering granules that didn't match, even if the query shape changes.

While the aforementioned I/O optimizations can significantly reduce data read, they still assume that all columns for rows passing the WHERE clause must be loaded before running operations like sorting, aggregation, or LIMIT. But what if some columns aren’t needed until later, or some data, despite passing the WHERE clause, is never needed at all? That’s where lazy materialization comes in. It is an orthogonal enhancement that completes the I/O optimization stack:

  • Indexing, together with PREWHERE, ensures that only rows matching column filters in the WHERE clause are processed.
  • Lazy materialization builds on this by deferring column reads until they’re actually required by the query execution plan. Even after filtering, only the columns needed for the next operation - such as sorting - are loaded immediately. Others are postponed and, due to LIMIT, are often read only partially, just enough to produce the final result. This makes lazy materialization especially powerful for Top N queries, where the final result may only require a handful of rows from certain, often large, columns.

A worked example

We highly recommend the blogpost "ClickHouse gets lazier (and faster): Introducing lazy materialization" for a deep dive on lazy materialization. The example below is taken from the aforementioned blogpost and reproduced here to demonstrate how a ClickHouse query can go from 219 seconds to just 139 milliseconds (a 1576× speedup) with lazy materialization.

To benefit from indexing and PREWHERE, a query needs filters, on primary key columns for indexing, and on any columns for PREWHERE. Lazy materialization layers cleanly on top, but unlike the other optimizations mentioned previously, it can also speed up queries with no column filters at all.

Consider the following example query which finds Amazon reviews with the highest number of helpful votes, regardless of date, product, rating, or verification status, and returns the top 3 along with their title, headline, and full text.

First running the query (with cold filesystem caches) with lazy materialization disabled (using query_plan_optimize_lazy_materialization):

SELECT
    helpful_votes,
    product_title,
    review_headline,
    review_body
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3
FORMAT Vertical
SETTINGS
    query_plan_optimize_lazy_materialization = false;
Row 1:
──────
helpful_votes:   47524
product_title:   Kindle: Amazon's Original Wireless Reading Device (1st generation)
review_headline: Why and how the Kindle changes everything
review_body:     This is less a \"pros and cons\" review than a hopefully use...

Row 2:
──────
helpful_votes:   41393
product_title:   BIC Cristal For Her Ball Pen, 1.0mm, Black, 16ct (MSLP16-Blk)
review_headline: FINALLY!
review_body:     Someone has answered my gentle prayers and FINALLY designed ...

Row 3:
──────
helpful_votes:   41278
product_title:   The Mountain Kids 100% Cotton Three Wolf Moon T-Shirt
review_headline: Dual Function Design
review_body:     This item has wolves on it which makes it intrinsically swee...

# highlight-start
0 rows in set. Elapsed: 219.071 sec. Processed 150.96 million rows, 71.38 GB (689.08 thousand rows/s., 325.81 MB/s.)
Peak memory usage: 1.11 GiB.
# highlight-end

Next the query is rerun (again with a cold filesystem cache), but this time with lazy materialization enabled:

SELECT
    helpful_votes,
    product_title,
    review_headline,
    review_body
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3
FORMAT Vertical
SETTINGS
-- highlight-next-line
query_plan_optimize_lazy_materialization = true;
Tip

Ordinarily you don't need to explicitly set query_plan_optimize_lazy_materialization = true to gain the benefit of lazy materialization. It is enabled by default.

Row 1:
──────
helpful_votes:   47524
product_title:   Kindle: Amazon's Original Wireless Reading Device (1st generation)
review_headline: Why and how the Kindle changes everything
review_body:     This is less a \"pros and cons\" review than a hopefully use...

Row 2:
──────
helpful_votes:   41393
product_title:   BIC Cristal For Her Ball Pen, 1.0mm, Black, 16ct (MSLP16-Blk)
review_headline: FINALLY!
review_body:     Someone has answered my gentle prayers and FINALLY designed ...

Row 3:
──────
helpful_votes:   41278
product_title:   The Mountain Kids 100% Cotton Three Wolf Moon T-Shirt
review_headline: Dual Function Design
review_body:     This item has wolves on it which makes it intrinsically swee...

# highlight-start
0 rows in set. Elapsed: 0.139 sec. Processed 150.96 million rows, 1.81 GB (1.09 billion rows/s., 13.06 GB/s.)
Peak memory usage: 3.80 MiB.
# highlight-end

Consider the difference in performance with lazy materialization turned off and turned on:

MetricLazy materialization offLazy materialization onImprovement
Elapsed time219.071 sec0.139 sec~1576× faster
Data read71.38 GB1.81 GB~40× less
Peak memory1.11 GiB3.80 MiB~300× less

How to confirm lazy materialization in the query execution plan

You can observe usage of lazy materialization for the previous query by inspecting the query’s logical execution plan using the EXPLAIN clause:

EXPLAIN actions = 1
SELECT
    helpful_votes,
    product_title,
    review_headline,
    review_body
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3
SETTINGS
    query_plan_optimize_lazy_materialization = true;
...
# highlight-next-line
Lazily read columns: review_headline, review_body, product_title
  Limit
    Sorting
      ReadFromMergeTree

You can read the operator plan from bottom to top and observe that ClickHouse defers reading the three large String columns until after sorting and limiting.