Denormalization has been the standard approach to analytical data modeling for good reason. Moving joins, lookups, and business rules out of query time and into ingestion gives you the fastest possible reads for a known access pattern. For most of the past decade, it was often the practical default for latency-sensitive analytics. Earlier columnar engines and distributed query processors could execute joins, but many workloads paid for them through higher latency, higher compute cost, spill-to-disk, or distributed coordination overhead.
That constraint has loosened. Modern columnar databases with advanced join algorithms have reduced the cost of runtime joins enough that normalization is now a genuinely viable option for many analytical workloads. Denormalization still delivers faster reads, but normalization can bring operational benefits: simpler pipelines, flexible schemas, and cleaner governance. Engineers can now make the decision based on their actual workload characteristics, rather than being forced into one approach by engine limitations.
This guide is a decision framework for making that choice in ClickHouse. It starts with why denormalization became the default, explains what has changed in join performance, then compares the tradeoffs on both sides so you can decide where to denormalize, where to join, and where to use ClickHouse primitives that bridge the gap.
For a broader evaluation framework covering latency, concurrency, ingest throughput, SQL flexibility, and cost across real-time OLAP options, see our guide to choosing a database for real-time analytics in 2026. For a deeper comparison of how ClickHouse executes star schema joins against Druid, Pinot, and cloud DWHs, see our star schema and fast joins guide.
TL;DR
- Denormalization and normalization are both valid modeling strategies. The right choice depends on your workload.
- Denormalization's tradeoffs are primarily operational: pipeline complexity, write-path overhead, data freshness lag, backfill burden, and semantic drift.
- Modern real-time OLAP engines (ClickHouse most prominently) have made normalized joins performant enough for many analytical workloads, using parallel/grace hash joins, merge joins, join reordering, runtime bloom filters, and dictionary-based direct joins.
- Denormalization still wins on raw read performance for a known access pattern. Scanning one pre-joined table with efficient filters is almost always faster than scanning multiple tables and joining at runtime.
- The tradeoff: denormalization optimizes read cost at the expense of write-path complexity, schema flexibility, and governance. Normalization preserves those operational qualities but adds join overhead at query time, including higher per-query CPU and memory use, especially under concurrency.
- Use the decision framework below to evaluate which approach fits each part of your workload.
Why denormalization became the default, and what changed in join performance
Data engineering practice has long followed a strict split: normalize for transactional writes, denormalize for analytical reads. Engineers adopted denormalization because it made analytical read latency more predictable, especially when joins required large distributed shuffles, disk spill, or careful query tuning.
The constraints were real and came from multiple directions.
Memory limitations. Early columnar engines executed hash joins purely in memory. When the right-hand side of a join exceeded available RAM, the options were bad: out-of-memory errors, or spilling to disk with severe performance penalties that made queries unpredictably slow.
Distributed coordination overhead. The MPP and MapReduce architectures that dominated the 2010s could work around memory limits by going wide, distributing join work across many nodes. But this introduced network shuffles, coordination overhead, and multi-step job execution that made joins slow and expensive. Today, many traditional cloud data warehouses still follow that design and will complete a massive join, but they may spend significant time and credits doing it.
Primitive optimizers. Legacy query planners couldn't dynamically reorder join graphs based on cardinality estimates, and they couldn't push predicates down efficiently. Engineers couldn't trust the optimizer to find a good plan, so they did the optimization themselves at ingestion time.
Given these constraints, denormalization was the rational engineering choice: pay the compute cost once at ingestion to guarantee predictable read performance. That calculus made sense, and for many workloads it still does.
What's changed is the engine side. Modern real-time OLAP engines have substantially closed the join performance gap, with ClickHouse investing heavily in join execution. Standard hash joins remain the default for fast, memory-resident operations. When intermediate state exceeds memory, grace hash joins spill intermediate state to disk without requiring pre-sorted data, allowing the query to continue instead of failing purely because the hash table no longer fits in RAM. Parallel hash joins use multiple CPU cores to accelerate execution. If tables are already sorted, full and partial merge joins can reduce or avoid the hashing phase, requiring less memory. For ultra-low-latency dimension lookups, ClickHouse's direct dictionary joins function as key-value lookups, delivering up to 25x speedup over hash joins in published benchmarks. All standard SQL join types are supported (INNER, LEFT, RIGHT, FULL, CROSS), plus SEMI, ANTI, and ASOF joins for analytical patterns spanning time windows or selectivity-driven filtering.
Enhanced global join reordering allows cost-based optimizers to restructure complex join graphs using cardinality estimates. On a six-table TPC-H query (scale factor 100), naive ordering without statistics took 3,903 seconds and ~100 GiB of peak memory. Enabling global join reordering with column statistics brought the same query to 2.7 seconds with under 4 GiB of memory: a 1,450x speedup and 25x memory reduction on the same hardware, data, and SQL. Runtime bloom filters, where the build side of a join passes filter conditions to the probe side before the join executes, delivered an additional 2.1x speedup and 7x memory reduction in ClickHouse's published TPC-H example.
Append-only event stores like Druid and Pinot often favor wide event tables because their architectures are optimized around immutable segments, ingestion-time indexing, and lookup or broadcast-style joins. Cloud data warehouses like Snowflake and BigQuery can execute complex joins, but the latency and cost profile is different from a purpose-built real-time OLAP engine, especially for high-concurrency serving workloads.
The bottom line: joins are no longer a constraint that automatically forces your modeling decisions. They are a cost you can now evaluate against the tradeoffs of denormalization for your specific workload.
Why denormalization is still the right choice for many workloads
Before talking about costs, it's worth stating the positive case clearly: denormalization works. If a workload has a dominant query path, a stable schema, and tight latency requirements, flattening the data is often the most direct way to make reads fast and predictable.
A denormalized table eliminates join overhead from the serving path. The engine can filter, aggregate, and return results from one physical table without building hash tables, probing dictionaries, or managing intermediate join state. Under high concurrency, that simplicity matters. Hundreds or thousands of simultaneous queries against a well-designed wide table are easier to reason about than the same traffic pattern repeatedly executing joins.
Denormalization also improves ergonomics for consumers. BI tools, embedded analytics, and application queries often work better against a table where the relevant attributes are already present. Fewer joins means fewer opportunities for analysts to pick the wrong key, apply the wrong join type, or accidentally change metric semantics.
This is why the right framing is not "normalize instead of denormalize." It is: denormalize when the read path is stable, latency-sensitive, and valuable enough to justify the extra work on the write path. Use joins when flexibility, freshness, and semantic clarity matter more than shaving every millisecond from a known query pattern.
Tradeoffs of denormalization
Denormalization optimizes read performance for known access patterns. That optimization has real tradeoffs on the write side and operational side. These tradeoffs don't make denormalization wrong, but they should be weighed explicitly against the read-time benefits.
Pipeline complexity and write-path overhead
Denormalization pushes join logic into the ingestion path. That extra work can live outside the database or inside it. Outside the database, joining streams before ingestion means managing stateful stream processors like Flink, with their checkpoint state management, recovery delays, and late-arriving data handling. This operational surface area grows with the complexity of your denormalization logic.
Inside the database, materialized views that maintain precomputed results, including rollups or denormalized target tables, create write amplification. An incremental materialized view acts like an insert trigger on the source table. Each insert generates additional work for the target view, and high-frequency inserts can outpace the engine's background merge capacity, leading to throttled writes once partitions hit active-part thresholds. For denormalized joins, incremental materialized views only react to inserts on the source table and need additional handling when joined dimension tables change. ClickHouse Cloud can mitigate this with compute-compute separation: read-write services handle inserts and background merges while read-only services run user-facing queries against the same underlying data.
Dimension updates surface the tradeoff clearly. Updating a customer's country in a normalized model touches one row in the customer table. In ClickHouse, lightweight updates (Patch Parts), when appropriate for the update size and table design, write a compact patch containing only the changed columns and rows, with roughly 40 bytes of uncompressed overhead per updated row. The patch part is created immediately when the UPDATE returns; the physical merge into the underlying data happens asynchronously in background merges. Benchmarks show this running up to 1,000x faster than classic ClickHouse mutations and up to 4,000x faster than PostgreSQL on bulk cold updates.
The same update against a denormalized flat table involves more work. If the predicate column isn't part of the table's ordering key, the engine must scan parts to identify where affected rows are located, then write potentially many sparse patch parts, followed by additional merge work to consolidate them. This is manageable for infrequent updates, but becomes a consideration when dimension updates are frequent or contend with the same compute serving user-facing queries.
Data freshness lag
Pre-joining bounds your analytical freshness to your slowest updating dimension. If a transaction stream arrives in real-time but the customer enrichment batch job runs hourly, your flattened table is artificially delayed. Late-arriving events can land, but derived wide-table columns remain stale until the pipeline resolves the discrepancy and rewrites the affected records.
In a normalized model, the dimension table updates independently, and queries against the current state reflect the latest values at join time.
Storage and scan considerations
Columnar storage achieves strong compression by grouping values of the same type together, letting codecs like LZ4 and ZSTD exploit patterns in the data. On typical fact tables, ClickHouse delivers 10x to 20x compression using dictionary encoding, run-length encoding, and general-purpose codecs.
Denormalization's impact on storage depends on the cardinality of the dimensions being flattened. Dimensions are typically low-cardinality: a country column might have 200 distinct values, a subscription tier might have 5. Flattening these into a billion-row fact table duplicates those values, but ClickHouse's LowCardinality column type mitigates this by storing the unique values once in a dictionary and using small integer pointers for each row. The pointers still take space, and you need to remember to declare the type, but the storage overhead is manageable for genuinely low-cardinality dimensions.
Where storage can suffer is when dimension columns aren't part of the table's sort order. Columnar compression works best when adjacent values are similar. Dimension values that are randomly distributed relative to the sort key compress less effectively regardless of their cardinality.
Schema rigidity and backfill burden
Schema changes cascade differently in normalized and denormalized models.
A concrete case: security asks to hash or redact customer names under a new privacy policy. In a normalized model, that's one column transformation on a 100k-row customer table. Future writes only need to hash when new customer rows are created.
In a denormalized model, the same request requires backfilling the hash across billions of historical fact rows, and reconfiguring the denormalization pipeline to apply the hash on every future fact row (whether it's a new customer or not). In any schema, downstream consumers (dashboards, alerts, reverse-ETL jobs) need verification that the change didn't break filters or joins. But the backfill scope is larger in the denormalized case, which translates to more compute, longer execution windows, and more risk to ongoing ingestion.
Consistency and semantic drift
Duplicating data duplicates business meaning. Flattened tables force implicit decisions about slowly changing dimensions.
SCD Type 1 attributes (overwrite the current value) and Type 2 (preserve versioned history) need different handling. Denormalizing them forces a decision about whether historical fact rows reflect the "as-was" state (what was true when the event happened) or the "as-is" state (what is currently true).
If a user upgrades their subscription tier, separate the two reporting questions explicitly. For "as-is" reporting, keep the current tier in a dimension table and join to it at query time. For "as-was" reporting, either model the dimension as SCD Type 2 and join by the event timestamp and effective date range, or intentionally record the tier at the point of the transaction in the fact table. The important part is deciding which meaning each column represents before downstream teams build metrics on top of it.
In a denormalized model, maintaining both views requires either rewriting historical rows when the dimension changes or accepting that the flat table reflects only one perspective. Teams that skip the rewrite can end up with divergence between the flat table and the dimension table, where each reports different values for the same logical attribute.
Tradeoffs of normalization
Normalization has its own tradeoffs. These are often underweighted in discussions that focus on denormalization's downsides, so they're worth stating explicitly.
Query-time overhead and concurrency cost
Every query that joins tables at runtime does more work than scanning a single pre-joined table. Depending on the join algorithm, the engine may build hash tables, probe lookup structures, spill intermediate state, or merge sorted streams. Under high concurrency, this overhead compounds: each concurrent query executing joins consumes more CPU and memory than the equivalent scan against a wide table. For latency-critical serving workloads with hundreds or thousands of concurrent queries, this overhead can be the deciding factor.
Query complexity for consumers
Normalized models push join logic to query time, which means analysts and application developers need to understand the schema relationships and write (or generate) correct joins. A denormalized table with clear column names is easier to query correctly, especially for less technical consumers or BI tools that generate SQL automatically.
Optimizer dependency
Normalized models rely on the query optimizer to find efficient join plans. A bad plan, whether from stale statistics, a complex join graph, or an optimizer limitation, can cause large performance regressions. Denormalized models sidestep this risk for the access patterns they serve.
Aggregate query performance
For aggregation-heavy workloads, denormalized tables let the engine apply filters and group-bys in a single pass without join overhead. Normalized models may require joining before aggregating, which increases intermediate data volumes and processing time.
When to join vs. denormalize in an analytical database
The choice isn't binary, and it shouldn't be made as a blanket architectural decision. Different parts of your workload may warrant different approaches. A common layered pattern keeps raw events in a bronze tier, cleaned and conformed data in a silver tier, dimensional and semantic models for reusable definitions, and denormalized serving tables for specific hot dashboards. In that setup, denormalized tables serve known access patterns while dimensional and semantic models remain available for workloads that need flexibility.
dbt is a common orchestration tool for this layered model. The ClickHouse dbt adapter supports incremental materializations for append-only facts and full-refresh for dimensions, with all models version-controlled in git.
Evaluate the tradeoff for your workload
Before flattening a schema, run your workload through these questions:
- Is the path strictly latency-critical? Sub-second SLA requirements, like ad-tech routing or fraud detection, favor flattening because eliminating join overhead provides the most predictable latency.
- How volatile are the dimensions? Frequently updated dimensions increase the write-path cost of keeping a denormalized table current. Stable, append-only dimensions are cheap to flatten.
- How many access patterns does the data serve? A single dominant query pattern is the sweet spot for denormalization. Multiple diverse patterns mean the flat table is optimized for one path and suboptimal for the rest, while a normalized model can support more patterns without duplicating the same attributes into multiple serving tables.
- Is the table well-filtered by partition and ordering keys? Strong pruning makes runtime joins efficient by reducing the data volumes involved.
- Can schema changes be backfilled safely? If backfills are slow enough to interfere with ingestion, require careful operational windows, or risk consistency issues, the schema rigidity cost of denormalization is high.
- Is it a hierarchical relationship? Deeply nested JSON often warrants selective extraction or, in ClickHouse, using the native JSON type, which shreds JSON into dynamic sub-columns with column-level compression and no upfront schema.
Quick reference: when each approach fits
| Factor | Denormalization fits when... | Normalization fits when... |
|---|---|---|
| Query pattern | Single dominant access pattern with tight latency SLA | Multiple diverse query patterns |
| Dimension volatility | Dimensions are stable, rarely updated | Dimensions change frequently |
| Read performance | Lowest possible latency is non-negotiable | Interactive latency is acceptable |
| Write-path complexity | Ingestion pipeline complexity is manageable | Simpler ingestion pipelines are a priority |
| Schema evolution | Schema is stable, changes are rare | Schema evolves frequently, backfills must be cheap |
| Governance | Single team owns the data, meaning is unambiguous | Multiple teams consume the data, semantic consistency matters |
ClickHouse primitives that bridge the gap
ClickHouse provides several primitives that let you get closer to denormalized read performance while maintaining normalized source data. These aren't all forms of denormalization themselves; they're different mechanisms that reduce the need to choose.
Dictionary-based lookups (direct joins) for fast dimension enrichment
Dictionaries load dimensional data into an optimized key-value structure. The flat layout provides array-offset lookups, delivering access speeds up to 25x faster than hash joins and 15x faster than parallel hash joins in published benchmarks. You keep your dimensions in a separate table and get near-denormalized lookup speed at query time without physically duplicating dimension columns in your fact table. Dictionaries work best for one-to-one or many-to-one lookups where a key maps to a single authoritative value; they are not appropriate for one-to-many or many-to-many relationships that require preserving multiple matches.
CREATE DICTIONARY customer_tiers (
customer_id UInt64,
tier String
)
PRIMARY KEY customer_id
SOURCE(ClickHouse(TABLE 'customers'))
LAYOUT(FLAT())
LIFETIME(MIN 300 MAX 3600);Materialized views for pre-aggregation
Materialized views let the database maintain pre-computed aggregations as data arrives, without requiring external pipeline infrastructure. They process incoming data blocks automatically and store the results in a target table. This is aggregation, not denormalization: you're pre-computing rollups, not flattening relationships.
Materialized views aren't free. They create write amplification (each insert generates parts for both the source and target tables). But that cost is usually smaller than running a parallel Flink or Kafka Streams pipeline externally, both in compute and in operational surface area.
CREATE MATERIALIZED VIEW hourly_sales_mv
ENGINE = SummingMergeTree
ORDER BY (shop_id, hour)
AS SELECT
shop_id,
toStartOfHour(created_at) AS hour,
sum(amount) AS total_revenue
FROM raw_sales
GROUP BY shop_id, hour;Projections for alternate access patterns
Projections maintain alternate physical sort orders of your base table's data. They're not a form of denormalization; they're a way to optimize multiple query patterns against the same underlying data. The optimizer automatically routes queries to the most efficient projection.
Since ClickHouse 25.6, lightweight projections can store only their sorting key plus a _part_offset pointer back into the base table, rather than duplicating full rows. In the benchmark discussed in ClickHouse's projection post, this used roughly half the storage of traditional projections and reduced query time by 90%. That makes lightweight projections a practical middle ground when you need better query performance on non-primary access patterns without duplicating every projected column.
When you do denormalize: guardrails
For workloads where explicit denormalization is the right choice, apply these guardrails to keep costs contained.
Separate point-in-time facts from current-state dimensions
When flattening data, capture the dimension value at transaction time in the fact table for "as-was" reporting. For "as-is" reporting, keep the current state in a dimension table and join at query time. In ClickHouse, dictionaries can make this lookup fast when the current-state mapping is one-to-one or many-to-one:
SELECT
s.order_id,
s.historical_tier,
dictGet('customer_tiers', 'tier', s.customer_id) AS current_tier
FROM sales s
WHERE s.historical_tier != dictGet('customer_tiers', 'tier', s.customer_id);Backfill incrementally
Avoid one-shot population-style backfills when creating a materialized view on a live production table with active writes. Backfill by partition or time range to bound memory and merge pressure. This reduces contention with incoming real-time streams and helps the database engine manage part merges without throttling.
Conclusion
Denormalization and normalization are both valid engineering choices. Neither option is universally better. The choice must fit the specific requirements of each part of your workload.
Denormalization gives you the fastest possible reads for a known access pattern. Normalization preserves schema flexibility, simplifies writes, and keeps business meaning in one place.
The best analytical systems let you make the choice per workload. Use normalized or partially normalized models where operational flexibility and governance matter. Denormalize the specific serving paths where read latency is the binding constraint. Review the ClickHouse join documentation to see how the optimizer selects between algorithms in production.
The fastest test uses your own data and your own access patterns. Spin up a free ClickHouse Cloud trial, load a representative slice of your fact and dimension tables, and run the joins that matter to you. For a reproducible join benchmark you can run yourself, explore the coffeeshop benchmark. The only latency number that matters for your build-or-flatten decision is the one your queries produce on your data.
Frequently asked questions about denormalization in analytical databases
Is denormalization a bad practice in modern analytical databases?
No. Denormalization is a specialized optimization that excels for latency-critical, read-heavy serving layers with known access patterns. It's a valid choice when the read-time benefits outweigh the pipeline complexity, schema rigidity, and governance overhead it introduces.
Does columnar storage eliminate the need for denormalization?
Not entirely. Columnar compression, block pruning, and vectorized execution make normalized star schemas much faster than legacy row-stores, which raises the bar for when denormalization is actually required. But scanning a single pre-filtered wide table is still generally faster than joining multiple tables at runtime. Columnar storage shifts the breakeven point; it doesn't eliminate the tradeoff.
Are joins slow in modern columnar databases?
Not necessarily. Modern engines, such as ClickHouse, use join reordering, parallel/grace hash joins, merge joins, and runtime bloom filters to make normalized star-schema joins fast and predictable at scale. Joins still have overhead compared to scanning a single table, but that overhead has decreased enough to be acceptable for many analytical workloads.
When should I denormalize in an analytical database?
Denormalize when you have a single dominant query pattern with tight latency SLAs (ad-tech bidding, real-time personalization, fraud detection), the dimensions are stable, and the schema is unlikely to change frequently. The operational tradeoffs of denormalization are lowest in that scenario.
What are the biggest operational tradeoffs of denormalization?
Pipeline complexity (stateful stream processors, materialized view write or refresh overhead), data freshness lag (bounded by your slowest dimension update), backfill burden when schemas change, and semantic drift when duplicated business logic diverges from the dimension tables.
What's the best alternative to denormalizing for fast dimension lookups?
Dictionary-based lookups (direct joins) in ClickHouse. They load dimension data into an optimized key-value structure, delivering up to 25x the speed of hash joins in published benchmarks. You keep your dimensions normalized and get near-denormalized lookup performance at query time for one-to-one or many-to-one relationships.
Should I use materialized views instead of denormalizing upstream in ETL?
Materialized views can replace external pipeline work for pre-aggregation use cases, and refreshable materialized views can support some denormalized serving-table patterns. They reduce operational surface area by keeping transformation logic inside the database. They add write or refresh overhead, but that may still be simpler than running a separate streaming pipeline.
How do I handle slowly changing dimensions (SCD) if I denormalize?
Store point-in-time attribute values in the fact table only when you intentionally want that denormalized "as-was" view. Another valid option is an SCD Type 2 dimension joined by event time and effective range. For "as-is" values, keep the current state in a dimension table and join at query time. In ClickHouse, dictionaries can make this fast for one-to-one or many-to-one lookups.
How can I backfill safely after adding a new column to a wide table?
Backfill incrementally by partition or time range to bound memory and merge pressure. Avoid one-shot population-style backfills on live write-heavy tables to reduce consistency and throttling risks.