Lightweight Updates: ClickHouse Core Database Innovations for Real-Time Analytics @ Open House 2025
TL;DR
- As the fastest and most resource-efficient real-time data warehouse and open-source database, ClickHouse has transformed OLAP updates and deletes, making them up to 2,400x faster.
- A new "patch part" architecture bypasses massive file rewrites by storing only unique row addresses and updated values in compact files.
- Deletions utilize a lightweight
_row_existslogical mask, immediately filtering data from queries without rewriting the entire data part. - Developers can now manage data changes using standard, ANSI-like
UPDATEandDELETESQL syntax. - Benchmarks demonstrate bulk updates are up to 1,700x faster with only a ~12% median impact on read queries before background merges complete.
Why do analytical (OLAP) databases struggle with updates?
Engineers migrating to analytical databases expect data updates to be low-latency operations. Analytical databases, however, prioritize large-scale data scans over high-frequency changes. This columnar design historically made frequent modifications costly and impractical.
ClickHouse has re-engineered its mutation mechanics to address this fundamental OLAP constraint. By replacing traditional heavyweight mutations with a new patch part architecture, ClickHouse handles high-frequency update patterns typically associated with OLTP without sacrificing high-performance OLAP query speeds. This architecture enables a wide range of use cases within a high-performance analytical database:
- Real-time data enrichment and streamlined GDPR compliance (Right to be forgotten)
- Late-arriving data corrections: Fixing upstream data pipeline errors without wiping whole partitions
- Slowly changing dimensions (SCDs): Updating dimensional tables (like user status or inventory levels) on the fly
- E-commerce and FinServ adjustments: Correcting transaction statuses or updating order quantities in near real-time
The OLTP vs. OLAP architecture tradeoff
Row-based OLTP databases prioritize write flexibility by storing records contiguously, enabling simple in-place updates. This model bottlenecks analytics, forcing systems to load entire rows into memory just to read specific columns.
ClickHouse reverses this tradeoff to achieve high-throughput ingest speeds and analytical query performance. It stores data in immutable data parts, where columns are physically sorted and isolated into separate, compressed files.
Immutability is a deliberate architectural advantage for read performance, but it creates inherent update friction. To manage this immutability, ClickHouse continuously writes new parts for incoming data. In the background, its compaction engine merges these smaller parts into larger ones using efficient linear scans. This is the MergeTree architecture's core operation.
The old way: the high cost of heavyweight mutations
Before ClickHouse introduced lightweight updates, modifying data required a heavyweight, asynchronous process known as a mutation. Executed with an ALTER TABLE ... UPDATE or DELETE command, these operations carried significant performance costs rooted in the immutable, columnar architecture. Because data parts cannot be changed in place, a mutation had to rewrite entire files to alter even a single row.
Heavyweight mutations created substantial I/O overhead. To execute an UPDATE, ClickHouse rewrote the modified columns into a new data part version. Deletes were even more resource-intensive. A mutation operation forces all data parts containing target rows to be re-written, with those rows excluded when forming the new part. ClickHouse does not support this batch operation by default for frequent, low-latency updates.
This architecture meant mutations are both CPU- and IO-intensive. Changes appeared only after the background mutation completed.
Mutations entered a queue and could not run until all prior merge and mutation tasks finished, leading to unpredictable delays. This high resource consumption meant database administrators often monitored the process to prevent cluster strain.
-- Legacy syntax for a heavyweight mutation
ALTER TABLE wikistat (DELETE WHERE project = 'SlowQueries')
The new architecture
The patch part architecture eliminates the SELECT query latency caused by previous read-time update workarounds.
| ClickHouse architecture | Core mechanism | SQL syntax | Write latency | Data visibility | Resource overhead |
|---|---|---|---|---|---|
| Legacy heavyweight mutations | Rewrites entire data files | ALTER TABLE ... UPDATE | High | Delayed | High |
| New lightweight updates | Compact patch parts / Logical masks | Standard UPDATE / DELETE | Low | Immediate | Low |
Updating with "patch parts"
The patch part architecture creates a new, compact patch part containing only the unique address of changed rows and the new column values. The original data remains untouched.
This approach avoids rewriting whole files, substantially reducing storage footprint and disk wear compared to legacy heavyweight mutations.
Targeting specific rows relies on system columns that uniquely identify every row across the entire table:
_block_number: The block number from the time of the original insert_block_offset: The row's offset within that block_part_offset: The row's ordinal position within its data part
These identifiers act as a permanent address, allowing ClickHouse to pinpoint the exact location of a row for an update.
Streamlining deletes with logical masks (_row_exists)
ClickHouse extends this efficiency to DELETE operations using the _row_exists system column. A lightweight delete updates the hidden _row_exists column, functioning as a logical mask. ClickHouse physically removes the data later in the background.
When a row is deleted, its _row_exists value toggles from 1 to 0. During subsequent SELECT queries, ClickHouse automatically filters out rows where _row_exists = 0, making the deletion effective on the next read.
Patch parts and logical masks integrate into the database's existing background merge process. As ClickHouse periodically merges smaller data parts into larger ones to maintain performance, it applies these patches and exclusions with minimal overhead.
By default, DELETE statements wait until marking the rows as deleted is completed before returning. Alternatively, you can run deletions asynchronously in the background using the lightweight_deletes_sync setting. If disabled, the DELETE statement returns immediately, but data may still be visible to queries until the background mutation finishes.
Adopting standard ANSI SQL syntax
This new architecture supports the ANSI SQL standard for UPDATE and DELETE syntax, significantly reducing developer friction. Engineers no longer need to learn proprietary syntax, which accelerates migrations from PostgreSQL or MySQL.
-- Standard SQL syntax for a lightweight update
UPDATE wikistat SET hits = hits + 1 WHERE path = 'Google'
SETTINGS allow_experimental_lightweight_update = 1, apply_patch_parts = 1, update_parallel_mode = 'auto'
Benchmark results: verifying the 1,700x speedup
Based on tests run against the lineitem table, the patch-part architecture delivers substantial speedups, making mutating data as fast as a standard INSERT. Bulk UPDATE operations are up to 1,700x faster than classic mutations, while single-row updates are up to 2,400x faster. (Note: Based on internal ClickHouse benchmarking against the lineitem dataset. Actual performance improvements will vary based on hardware, schema, and specific workloads.)
This write performance introduces a temporary tradeoff on read queries. In a worst-case scenario (querying data before patches are merged), SELECT queries are only 7-21% slower on average, with a ~12% median impact.
Once the background merge completes, read performance returns to baseline (0% impact). This validates the patch-part architecture over naive read-time workarounds (like ReplacingMergeTree and FINAL), which permanently slow queries by an average of 280%.
A temporary ~12% read penalty for a 1,700x write-speed improvement represents a favorable tradeoff for most workloads. Lightweight updates enable high-frequency update patterns previously reserved for row-stores, supporting real-time data enrichment and efficient operations without sacrificing analytical performance.
| Update methodology | Bulk UPDATE speed | Single-row UPDATE speed | Median read query impact |
|---|---|---|---|
| Legacy mutations | Baseline | Baseline | Baseline (0%) |
Read-time workaround (ReplacingMergeTree + FINAL) | N/A | N/A | 280% Slower |
| New patch part architecture | Up to 1,700x Faster | Up to 2,400x Faster | ~12% Slower (Temporary) |
Stop fearing updates: test the performance on your own workload
Updates in analytical databases no longer require complex workarounds. By transforming high-frequency updates into first-class, lightweight operations, ClickHouse combines OLTP write flexibility with OLAP read speeds.
ClickHouse Cloud fully manages background operations like merges and patch applications for its shared-storage architecture. You get the speed of lightweight updates with minimal operational overhead.
Benchmark your own workload using the step-by-step lightweight updates tutorial on GitHub, or test the performance difference firsthand with a free ClickHouse Cloud trial and pre-loaded datasets.
FAQs
Why do OLAP databases struggle with updates?
OLAP databases are built for large scans, not frequent row changes. In columnar systems, even small updates can trigger expensive file rewrites.
What are ClickHouse lightweight updates?
Lightweight updates are a new way to update data without rewriting full data parts. ClickHouse stores only the changed row addresses and new values in compact patch parts.
How do lightweight updates work in ClickHouse?
ClickHouse creates a patch part with unique row identifiers and updated values. The original data stays untouched until background merges apply the changes.
How do lightweight deletes work in ClickHouse?
Deletes use a hidden _row_exists mask. Rows marked as 0 are filtered out on the next read, then removed later during background merges.
What SQL syntax does ClickHouse use for lightweight updates and deletes?
ClickHouse supports standard UPDATE and DELETE syntax for lightweight changes. This makes migrations easier for teams coming from PostgreSQL or MySQL.
How much faster are ClickHouse lightweight updates?
Bulk updates are up to 1,700x faster than classic mutations. Single-row updates are up to 2,400x faster in the benchmark cited in the article.
Do lightweight updates slow down read queries?
Yes, temporarily. Before background merges finish, read queries were about 12% slower at the median in the benchmark.
What use cases do lightweight updates support?
They support real-time enrichment, GDPR deletes, late-arriving corrections, slowly changing dimensions, and near real-time transaction updates.
