Open House 2025: ClickHouse and JOINs

TL;DR

  • ClickHouse now executes complex join operations up to 45x faster on average across workloads, providing data teams the flexibility to choose either highly-normalized star schemas or wide, flat tables without performance penalties.
  • Three deep architectural breakthroughs drive this performance: multi-threaded parallel hash joins, an intelligent global join reordering optimizer, and advanced filter pushdown capabilities.
  • Querying storage-efficient, highly-normalized star schemas in real-time speeds up real-time ingestion by eliminating the need to pre-join or transform massive datasets on the fly during the write path.
  • A highly versatile toolkit of join algorithms (including parallel_hash, full_sorting_merge, direct, and grace_hash) optimizes efficiency for any workload scale or memory profile.
  • TPC-H benchmark tests demonstrate that complex queries that previously timed out or crashed because of memory limits now execute in just seconds.

Historically, real-time databases required strict denormalization to avoid join bottlenecks. For years, the conventional advice for ClickHouse users was rigid: denormalize everything. A long-standing myth held that ClickHouse struggled with complex joins.

Data engineers faced an engineering trade-off: denormalize data into wide tables that require brittle ETL pipelines, rely on powerful Materialized Views to pre-compute joins at ingestion, or maintain an efficient star schema that historically suffered from slow queries.

Wide, flat tables remain fast and a proven ClickHouse use case. Teams now have full flexibility to choose their preferred approach. By querying normalized schemas directly, teams can ingest data faster in real-time because they no longer need to pre-join or transform massive datasets on the fly during the write path.

Star schemas separate data into a fact table and multiple dimension tables, simplifying data management. However, this normalized structure depends entirely on the database's ability to perform fast joins. That perception of poor join performance is now outdated.

By parallelizing hash joins and reordering queries, ClickHouse removes the strict requirement for flat tables, allowing teams to utilize optimal data models without latency penalties.

Improving how ClickHouse executes joins eliminates the historical dilemma, enabling both sophisticated data models and low-latency query speed. These engine upgrades deliver an overall average 45x performance increase on industry-standard TPC-H tests, establishing ClickHouse as a highly performant engine for complex, join-heavy workloads.

What drives the 45x speedup? Inside the 3 engine breakthroughs

Recent performance gains in ClickHouse stem from parallelizing hash joins, automating query reordering, and expanding join algorithms. Together, they allow ClickHouse to execute complex, multi-table joins at speeds that historically drove the preference for denormalization.

How do parallel hash joins eliminate build-phase bottlenecks?

The classic hash join algorithm operates in two phases: a "build" phase that creates an in-memory hash table from the right-hand table, and a "probe" phase that reads the left-hand table to find matches. Historically, the build phase bottlenecked queries because it relied on single-threaded execution. Even with multiple cores available, queries hit a performance ceiling waiting for one thread to construct the hash table.

The new parallel_hash join, now the default in recent versions, overcomes this ceiling. Instead of creating one large hash table, ClickHouse partitions the right-hand table into buckets and builds multiple smaller hash tables concurrently across all available CPU cores. By processing in parallel, the time-consuming build step accelerates dramatically, removing a critical performance barrier for join-heavy queries on modern multi-core servers.

How does the smart optimizer reorder queries for peak efficiency?

Minimizing the in-memory hash table dictates hash join efficiency. To reduce memory consumption and latency, the smaller table must ideally reside on the right, or "build" side. Historically, developers had to manually order their joins to achieve this efficiency.

ClickHouse now automates table ordering. For two-table joins, the query planner uses the principle of commutativity (A JOIN B is equivalent to B JOIN A). Since version 24.12, it automatically swaps the tables to ensure the smaller one is placed on the right, reducing memory pressure and accelerating the hash join phase.

For queries involving three or more tables, the optimizer applies the principle of associativity ((A JOIN B) JOIN C is equivalent to A JOIN (B JOIN C)). This "Global Join Reordering" allows ClickHouse to dynamically rearrange the entire sequence of joins to create the most efficient execution path.

In TPC-H benchmark tests, enabling this feature transformed a specific, highly complex six-table join that took over an hour (3,903 seconds) into one that completed in just 2.7 seconds. This represents a more than 1000x speedup for that specific query, complementing the 45x overall average workload improvement.

Why is filter pushdown critical for join optimization?

Filter pushdown applies WHERE clause conditions to tables before they are joined, preventing the processing of millions of irrelevant rows and naturally helping to optimize joins. Filter pushdown is highly effective for complex OR conditions and is further enhanced by runtime bloom filters. Less data means faster joins.

The optimizer also saves resources by rewriting queries. If a predicate in a WHERE clause effectively negates the need for an OUTER JOIN, ClickHouse automatically converts it to a more performant INNER JOIN. This simplification reduces both query latency and memory usage, slashing memory requirements from gigabytes to megabytes.

Where is the proof? Analyzing the TPC-H benchmark results

The TPC-H benchmark evaluates the performance of complex analytical queries. The benchmark's 22 queries simulate real-world workloads, making it the industry-standard test for join-heavy workloads.

Tests conducted on a self-managed 32-core AWS instance with 128 GiB of RAM ran the benchmark at a scale factor of 100, which includes a 600-million-row fact table. Before the optimizations, this workload exposed significant bottlenecks.

Complex queries with multiple joins were exceptionally slow, and out-of-memory (OOM) errors threatened systems with limited RAM. These documented TPC-H benchmark results are achievable out-of-the-box starting from baseline version 24.12, ensuring users can consistently replicate these exact performance leaps.

These query planner optimizations and parallel join capabilities are available out-of-the-box in ClickHouse Cloud, abstracting away the need for users to manually provision infrastructure or tune engine settings.

Recent improvements transformed query speeds. Parallelizing hash joins and smarter join reordering bypass the previous bottlenecks. Incorporating the final phase of ClickHouse's join-reordering logic, which allows the optimizer to automate multi-table join sequencing, yields dramatic efficiency gains across the entire benchmark workload.

These architectural changes translate directly to tangible performance leaps, removing the risk of memory failures and query timeouts.

Benchmark metricPre-optimization ClickHouseModern ClickHouseNet improvement
Average query timeBaseline executionVastly accelerated20.7x faster
Global join reorderingMulti-table joins took 1+ hoursIntelligent automated sequencing45x faster (overall workload)
Query 19 performanceTimed out (75+ minutes)1.5 secondsTransformational
Memory management99+ GiB (High OOM risk)Highly optimized / stableEliminated bottlenecks

These results reframe the conversation around ClickHouse vs. Snowflake joins, as well as comparisons with Databricks or BigQuery. Historically, teams routed complex, multi-table joins to traditional cloud data warehouses, reserving ClickHouse primarily for flat-table workloads.

CompanyCore value propositionNative engine architectureStar schema performanceFlat-table performance
ClickHouseFastest open-source real-time analytics databaseUnified low-latency engineExceptionally highExceptionally high
SnowflakeTraditional Cloud Data WarehouseProprietary CloudModerate to highModerate to high
DatabricksTraditional Cloud Data LakehouseProprietary CloudModerate to highModerate to high
BigQueryTraditional Cloud Data WarehouseProprietary CloudModerate to highModerate to high

By achieving a 45x overall speedup on industry-standard benchmarks, ClickHouse demonstrates that it can compete directly in complex analytical query performance, providing a unified, low-latency engine for both denormalized data and traditional star schemas.

Beyond hash joins: which algorithm fits your specific workload?

While the parallel hash algorithm represents a major leap forward, it is only one component of a larger, flexible toolkit designed for diverse analytical scenarios. Engineers can choose from a variety of join algorithms, each optimized for different memory and performance requirements.

Join algorithmBest use caseMemory profilePerformance advantage
parallel_hashStandard multi-core workloadsModerateBreaks single-thread bottlenecks by partitioning the build phase
full_sorting_mergePre-sorted data on diskExceptionally LowMatches/exceeds hash join speeds by skipping the sorting phase
directHigh-performance key-value lookupsIn-memory limitsBypasses hash table creation using ClickHouse Dictionaries
grace_hashHandling massive, out-of-core datasetsSpills to diskSafely executes non-memory-bound queries without OOM errors

This versatility ensures developers have the right tools for any job, supporting standard SQL join types including INNER, LEFT/RIGHT/FULL OUTER, and ASOF joins.

How can you tune ClickHouse joins for your own queries?

Use the EXPLAIN clause to inspect the query plan and confirm which join algorithm the engine selects, ensuring the optimizer behaves as expected.

EXPLAIN PLAN 
SELECT region, sum(revenue) 
FROM sales 
JOIN customers ON sales.customer_id = customers.id 
WHERE sales.date > '2024-01-01'
GROUP BY region;

To ensure you are using the latest multi-threaded capabilities, you can explicitly configure the engine to use parallelized hash joins for a specific session or query:

-- Explicitly enable parallel hash joins

SET join_algorithm = 'parallel_hash';

Effective join performance relies on filtering data before it is joined to reduce processing volume. Because ClickHouse automatically pushes filter conditions down the query plan, verify this behavior with EXPLAIN. If filters are not applied early, you can force the optimization by filtering within a subquery directly in the JOIN clause:

-- Explicit filter pushdown using a subquery

SELECT c.name, s.total
FROM customers c
JOIN (
    SELECT customer_id, sum(amount) as total
    FROM sales
    WHERE transaction_date >= '2024-01-01' -- Filter applied before the join
    GROUP BY customer_id
) s ON c.id = s.customer_id;

By parallelizing hash joins and automating query optimization, ClickHouse elevates star schemas to first-class citizens alongside highly-performant flat tables, delivering both elegant architecture and low-latency query performance.

FAQs

Do you still need to denormalize data in ClickHouse to avoid joins?

No. ClickHouse can now run complex joins fast enough that teams can choose either star schemas or wide tables.

How much faster are joins in ClickHouse now?

The article reports up to 45x faster join performance on average across tested workloads. Some individual benchmark queries improved by more than 1000x.

What changed to make ClickHouse joins faster?

Three changes drive the improvement: parallel hash joins, global join reordering, and filter pushdown. Together they reduce CPU bottlenecks, memory use, and wasted work.

What is parallel_hash join in ClickHouse?

It splits the right-hand table into buckets and builds multiple hash tables at the same time across CPU cores. This removes the old single-threaded build bottleneck.

What is global join reordering?

It is an optimizer feature that automatically changes join order to build more efficient execution plans. It places smaller tables on the build side and improves multi-table join sequencing.

Why is filter pushdown important for join performance?

It applies filters before the join runs. That reduces the number of rows processed and lowers memory use and query latency.

Can ClickHouse query star schemas in real time?

Yes. The article says improved join performance makes normalized star schemas practical for real-time analytics without forcing pre-joins at ingest.

Which join algorithm should you use in ClickHouse?

Use parallel_hash for standard multi-core workloads, full_sorting_merge for pre-sorted data, direct for dictionary-style lookups, and grace_hash for datasets that spill beyond memory.