DELETE statement removes rows from the table
[db.]table that match the expression
expr. It is only available for the *MergeTree table engine family.
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr;
It is called "lightweight
DELETE" to contrast it to the ALTER table DELETE command, which is a heavyweight process.
-- Deletes all rows from the `hits` table where the `Title` column contains the text `hello`
DELETE FROM hits WHERE Title LIKE '%hello%';
DELETE does not delete data from storage immediately
DELETE, deleted rows are internally marked as deleted immediately and will be automatically filtered out of all subsequent queries. However, cleanup of data happens during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
If you need to guarantee that your data is deleted from storage in a predictable time, consider using the ALTER table DELETE command. Note that deleting data using
ALTER table DELETE may consume significant resources as it recreates all affected parts.
Deleting large amounts of data
Large deletes can negatively affect ClickHouse performance. If you are attempting to delete all rows from a table, consider using the
TRUNCATE TABLE command.
Limitations of lightweight
DELETEs do not work with projections
DELETE does not work for tables with projections. This is because rows in a projection may be affected by a
DELETE operation and may require the projection to be rebuilt, negatively affecting
Performance considerations when using lightweight
Deleting large volumes of data with the lightweight
DELETE statement can negatively affect SELECT query performance.
The following can also negatively impact lightweight
- A heavy
WHEREcondition in a
- If the mutations queue is filled with many other mutations, this can possibly lead to performance issues as all mutations on a table are executed sequentially.
- The affected table having a very large number of data parts.
- Having a lot of data in compact parts. In a Compact part, all columns are stored in one file.
DELETE requires the
ALTER DELETE privilege. To enable
DELETE statements on a specific table for a given user, run the following command:
GRANT ALTER DELETE ON db.table to username;
How lightweight DELETEs work internally in ClickHouse
- A "mask" is applied to affected rows
DELETE FROM table ... query is executed, ClickHouse saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows are omitted for subsequent queries. However, rows are actually only removed later by subsequent merges. Writing this mask is much more lightweight than what is done by an
ALTER table DELETE query.
The mask is implemented as a hidden
_row_exists system column that stores
True for all visible rows and
False for deleted ones. This column is only present in a part if some rows in the part were deleted. This column does not exist when a part has all values equal to
SELECTqueries are transformed to include the mask
When a masked column is used in a query, the
SELECT ... FROM table WHERE condition query internally is extended by the predicate on
_row_exists and is transformed to:
SELECT ... FROM table PREWHERE _row_exists WHERE condition
At execution time, the column
_row_exists is read to determine which rows should not be returned. If there are many deleted rows, ClickHouse can determine which granules can be fully skipped when reading the rest of the columns.
DELETEqueries are transformed to
ALTER table UPDATEqueries
DELETE FROM table WHERE condition is translated into an
ALTER table UPDATE _row_exists = 0 WHERE condition mutation.
Internally, this mutation is executed in two steps:
SELECT count() FROM table WHERE conditioncommand is executed for each individual part to determine if the part is affected.
Based on the commands above, affected parts are then mutated, and hardlinks are created for unaffected parts. In the case of wide parts, the
_row_existscolumn for each row is updated and all other columns' files are hardlinked. For compact parts, all columns are re-written because they are all stored together in one file.
From the steps above, we can see that lightweight deletes using the masking technique improves performance over traditional
ALTER table DELETE commands because
ALTER table DELETE reads and re-writes all the columns' files for affected parts.