The Lightweight UPDATE Statement
The lightweight UPDATE
statement updates rows in a table [db.]table
that match the expression filter_expr
.
It is called "lightweight update" to contrast it to the ALTER TABLE ... UPDATE
query, which is a heavyweight process that rewrites entire columns in data parts.
It is only available for the MergeTree
table engine family.
The filter_expr
must be of type UInt8
. This query updates values of the specified columns to the values of the corresponding expressions in rows for which the filter_expr
takes a non-zero value.
Values are cast to the column type using the CAST
operator. Updating columns used in the calculation of the primary or partition keys is not supported.
Examples
Lightweight updates do not update data immediately
Lightweight UPDATE
is implemented using patch parts - a special kind of data part that contains only the updated columns and rows.
A lightweight UPDATE
creates patch parts but does not immediately modify the original data physically in storage.
The process of updating is similar to a INSERT ... SELECT ...
query but the UPDATE
query waits until the patch part creation is completed before returning.
The updated values are:
- Immediately visible in
SELECT
queries through patches application - Physically materialized only during subsequent merges and mutations
- Automatically cleaned up once all active parts have the patches materialized
Lightweight updates requirements
Lightweight updates are supported for MergeTree
, ReplacingMergeTree
, CollapsingMergeTree
engines and their Replicated
and Shared
versions.
To use lightweight updates, materialization of _block_number
and _block_offset
columns must be enabled using table settings enable_block_number_column
and enable_block_offset_column
.
Lightweight deletes
A lightweight DELETE
query can be run as a lightweight UPDATE
instead of a ALTER UPDATE
mutation. The implementation of lightweight DELETE
is controlled by setting lightweight_delete_mode
.
Performance considerations
Advantages of lightweight updates:
- The latency of the update is comparable to the latency of the
INSERT ... SELECT ...
query - Only updated columns and values are written, not entire columns in data parts
- No need to wait for currently running merges/mutations to complete, therefore the latency of an update is predictable
- Parallel execution of lightweight updates is possible
Potential performance impacts:
- Adds an overhead to
SELECT
queries that need to apply patches - Skipping indexes and projections are not used for data parts that have patches to be applied
- Small updates which are too frequent may lead to a "too many parts" error. It is recommended to batch several updates into a single query, for example by putting ids for updates in a single
IN
clause in theWHERE
clause - Lightweight updates are designed to update small amounts of rows (up to about 10% of the table). If you need to update a larger amount, it is recommended to use the
ALTER TABLE ... UPDATE
mutation
Concurrent operations
Lightweight updates don't wait for currently running merges/mutations to complete unlike heavy mutations.
The consistency of concurrent lightweight updates is controlled by settings update_sequential_consistency
and update_parallel_mode
.
Update permissions
UPDATE
requires the ALTER UPDATE
privilege. To enable UPDATE
statements on a specific table for a given user, run:
Details of the implementation
Patch parts are the same as the regular parts, but contain only updated columns and several system columns:
_part
- the name of the original part_part_offset
- the row number in the original part_block_number
- the block number of the row in the original part_block_offset
- the block offset of the row in the original part_data_version
- the data version of the updated data (block number allocated for theUPDATE
query)
On average it gives about 40 bytes (uncompressed data) of overhead per updated row in the patch parts.
System columns help to find rows in the original part which should be updated.
System columns are related to the virtual columns in the original part, which are added for reading if patch parts should be applied.
Patch parts are sorted by _part
and _part_offset
.
Patch parts belong to different partitions than the original part.
The partition id of the patch part is patch-<hash of column names in patch part>-<original_partition_id>
.
Therefore patch parts with different columns are stored in different partitions.
For example three updates SET x = 1 WHERE <cond>
, SET y = 1 WHERE <cond>
and SET x = 1, y = 1 WHERE <cond>
will create three patch parts in three different partitions.
Patch parts can be merged among themselves to reduce the amount of applied patches on SELECT
queries and reduce the overhead. Merging of patch parts uses the replacing merge algorithm with _data_version
as a version column.
Therefore patch parts always store the latest version for each updated row in the part.
Lightweight updates don't wait for currently running merges and mutations to finish and always use a current snapshot of data parts to execute an update and produce a patch part. Because of that there can be two cases of applying patch parts.
For example if we read part A
, we need to apply patch part X
:
- if
X
contains partA
itself. It happens ifA
was not participating in merge whenUPDATE
was executed. - if
X
contains partB
andC
, which are covered by partA
. It happens if there was a merge (B
,C
) ->A
running whenUPDATE
was executed.
For these two cases there are two ways to apply patch parts respectively:
- Using merge by sorted columns
_part
,_part_offset
. - Using join by
_block_number
,_block_offset
columns.
The join mode is slower and requires more memory than the merge mode, but it is used less often.
Related Content
ALTER UPDATE
- HeavyUPDATE
operations- Lightweight
DELETE
- LightweightDELETE
operations