Tuning Managed ClickStack - refining your schema
If you've been running ClickStack for a while, you've probably noticed that the default schema handles most observability workloads without any changes. This page is for when that's no longer enough: query latency starts to climb, or your access patterns have drifted away from the defaults.
Four optimizations cover most of what helps in practice. They're listed roughly in order of effort. The first two are local ALTER TABLE changes you can roll out incrementally. The third pays off when the same aggregation runs over and over on a dashboard. The fourth needs a table migration, so it's the most involved.
The summaries below are short on purpose. For the reasoning behind each change, benchmarks, and the recipes for rolling it out to existing data, see Performance tuning.
Materialize frequently queried attributes
Filtering on LogAttributes['service.version'] asks ClickHouse to load and decode the whole LogAttributes Map for every row it examines. Promote the attribute to a MATERIALIZED column and the same filter becomes a column read, usually an order of magnitude faster. ClickStack rewrites the filter automatically once the column exists, so saved searches and dashboards keep working unchanged.
Pick the attributes you actually query often. Each materialized column costs storage and insert time, so this is a "promote what you use" exercise rather than a "promote everything" one.
Existing rows stay empty for the new column until you also ALTER TABLE otel_logs MATERIALIZE COLUMN ServiceVersion.
Read more: Materialize frequently queried attributes.
Add skip indexes
Skip indexes let ClickHouse rule out granules of data that can't match a filter, turning a scan into a small targeted read. Three types are worth knowing about:
- Text indexes (
text(tokenizer = ...)) on string columns and onmapKeys/*AttributeItemsarrays. The default logs schema already ships these. - Min-max indexes on numeric columns filtered by range. Trace
Durationis the classic case. - Bloom filters for high-cardinality equality lookups on ClickHouse versions that don't yet support text indexes.
A skip index only pays for its evaluation cost if it actually prunes granules. Confirm with EXPLAIN indexes = 1 on a representative query before assuming it helped.
Read more: Adding skip indexes.
Pre-compute repeated aggregations
When the same aggregation runs again and again on a dashboard (top services by error rate, p99 latency per endpoint, request counts per minute), a materialized view computes the result at insert time and writes it to a small rollup table. Dashboards then hit the rollup instead of the raw logs or traces, which is cheap by comparison.
This pays off when the dashboard is hot and the underlying table is large. The cost is some insert-time CPU and a second table to maintain.
Read more: Exploiting materialized views.
Choose a primary key for your access pattern
The primary key controls how rows are sorted on disk. Filters on the leading columns of that key let ClickHouse seek straight to the relevant region; filters that don't lead with one of those columns scan the whole partition.
The default logs key (toStartOfFiveMinutes(Timestamp), ServiceName, Timestamp) favors "what happened in the last N minutes for service X". If most of your queries lead with a different column (a tenant id, a customer id, a region), changing the primary key to lead with that column is the highest-impact change you can make.
ClickHouse doesn't allow editing the primary key in place, so this is a table migration rather than a simple ALTER. The performance tuning guide walks through creating the new table, redirecting ingestion, and using a Merge table so existing dashboards keep working across old and new data.
Read more: Modifying the primary key.
Further reading
- Performance tuning: full guide, including projections and row-lookup acceleration.
- Tables and schemas used by ClickStack: the canonical DDL the optimizations build on.
- Going to production: broader production recommendations.