Build a rollup with materialized views for fast time-series analytics
This tutorial shows you how to maintain pre-aggregated roll-ups from a high-volume events table using materialized views. You’ll create three objects: a raw table, a rollup table, and the materialized view that writes into the rollup automatically.
When to use this pattern
Use this pattern when:
- You have an append-only events stream (clicks, pageviews, IoT, logs).
- Most queries are aggregations over time ranges (per minute/hour/day).
- You want consistent sub-second reads without re-scanning all raw rows.
Create the raw events table
Notes
PARTITION BY toYYYYMM(event_time)
keeps partitions small and easy to drop.ORDER BY (event_time, user_id)
supports time-bounded queries + secondary filter.LowCardinality(String)
saves memory for categorical dimensions.TTL
cleans up raw data after 90 days (tune to your retention requirements).
Design the rollup (aggregated) table
We’ll pre-aggregate to hourly granularity. Choose your grain to match the most common analysis window.
We store aggregate states (e.g., AggregateFunction(sum, ...)
) which compactly represent partial aggregates and can be merged or finalized later.
Create a materialized view that populates the rollup
This materialized view fires automatically on inserts into events_raw
and writes aggregate states into the rollup.
Querying the rollup
You can either merge states at read time, or finalize them:
- Merge at read time
- Finalize with -Final
If you expect reads to always hit the rollup, you can create a second materialized view that writes finalized numbers to a “plain” MergeTree
table at the same 1h grain.
States give more flexibility while finalized numbers give slightly simpler reads.
Filter on fields in the primary key for best performance
You can use the EXPLAIN
command to see how the index is used to prune data:
The query execution plan above shows three types of indexes being used:
a MinMax index, a partition index, and a primary key Index.
Each index makes use of fields specified in our primary key: (bucket_start, country, event_type)
.
For best filtering performance you will want to make sure that your queries are making use of primary key fields to prune data.
Common variations
- Different grains: add a daily rollup:
Then a second materialized view:
- Compression: apply codecs to big columns (example:
Codec(ZSTD(3))
) on the raw table. - Cost control: push heavy retention to the raw table and keep long-lived roll-ups.
- Backfilling: when loading historical data, insert into
events_raw
and let the materialized view build roll-ups automatically. For existing rows, usePOPULATE
on materialized view creation if suitable, orINSERT SELECT
.
Clean-up and retention
- Increase raw TTL (e.g., 30/90 days) but keep roll-ups for longer (e.g., 1 year).
- You can also use TTL to move old parts to cheaper storage if tiering is enabled.
Troubleshooting
- Materialized view not updating? Check that inserts go to the events_raw (not the roll-up table), and that the materialized view target is correct (
TO events_rollup_1h
). - Slow queries? Confirm they hit the rollup (query the rollup table directly) and that the time filters align to the rollup grain.
- Backfill mismatches? Use
SYSTEM FLUSH LOGS
and checksystem.query_log
/system.parts
to confirm inserts and merges.