Videos

Late materialization of secondary indices

Name: Late materialization of secondary indices
Uploaded: 2025-12-11T16:31:30.229Z

Previous videoIntro to Full-Text Search in ClickHouse

Next videoScaling ClickHouse to petabytes of logs at OpenAI

Mark Needham

Ever wondered how ClickHouse handles the creation of secondary indexes during data ingestion? In ClickHouse 25.10, a new feature called late materialization of secondary indices is introduced, providing more control over when indexes are built.

This can be a game-changer when working with computationally expensive indexes, such as vector similarity indexes, where you may want to load your data first and build the index later, rather than incurring the performance cost upfront.

In this tutorial, we use Wikipedia embeddings and vector similarity indexes to demonstrate the performance difference between immediate and late materialization. You'll see how inserting 150,000 rows takes 26 seconds with immediate index creation versus just 2 seconds when deferring it, and how query performance compares before and after materialization.

Setting up vector similarity indexes with HNSW algorithm and L2 distance
Using exclude_materialize_skip_indexes_on_insert to defer index creation during data loading
Comparing query performance with and without materialized indexes (8x speedup demonstrated)
Controlling index materialization behavior during merges with configuration settings
Manually triggering index materialization with ALTER TABLE commands

Parquet files with embeddings - https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M

Recent videos

Open House

Scaling ClickHouse to petabytes of logs at OpenAI

Open House

How ClickHouse helps Anthropic scale observability

Open House, User stories

How Capital One cut infrastructure costs by 50%

Engineering leaders at Capital One share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.