Delete Overview
There are several ways to delete data in ClickHouse, each with its own advantages and performance characteristics. You should select the appropriate method based on your data model and the amount of data you intend to delete.
Method | Syntax | When to use |
---|---|---|
Lightweight delete | DELETE FROM [table] | Use when deleting small amounts of data. Rows are immediately filtered out of all subsequent SELECT queries but are initially only internally marked as deleted, not removed from disk. |
Delete mutation | ALTER TABLE [table] DELETE | Use when data must be deleted from disk immediately (e.g. for compliance). Negatively affects SELECT performance. |
Truncate table | TRUNCATE TABLE [db.table] | Efficiently removes all data from a table. |
Drop partition | DROP PARTITION | Efficiently removes all data from a partition. |
Here is a summary of the different ways to delete data in ClickHouse:
Lightweight deletes
Lightweight deletes cause rows to be immediately marked as deleted such that they can be automatically filtered out of all subsequent SELECT
queries. Subsequent removal of these deleted rows occurs during natural merge cycles and thus incurs less I/O. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted. If you need to guarantee that data is deleted, consider the above mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. command.
Deleting large volumes of data with the lightweight DELETE
statement can also negatively affect SELECT
query performance. The command is also not compatible with tables with projections.
Note that a mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. is used in the operation to mark the deleted rows (adding a _row_exists
column), thus incurring some I/O.
In general, lightweight deletes should be preferred over mutations if the existence of the deleted data on disk can be tolerated (e.g. in non-compliance cases). This approach should still be avoided if all data needs to be deleted.
Read more about lightweight deletes.
Delete mutations
Delete mutations can be issued through a ALTER TABLE ... DELETE
command e.g.
These can be executed either synchronously (by default if non-replicated) or asynchronously (determined by the mutations_sync setting). These are extremely IO-heavy, rewriting all the partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that match the WHERE
expression. There is no atomicityAtomicity ensures that a transaction (a series of database operations) is treated as a single, indivisible unit. This means that either all operations within the transaction occur, or none do. An example of an atomic transaction is transferring money from one bank account to another. If either step of the transfer fails, the transaction fails, and the money stays in the first account. Atomicity ensures no money is lost or created. to this process - partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. are substituted for mutated partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. as soon as they are ready, and a SELECT
query that starts executing during a mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. will see data from partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that have already been mutated along with data from partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that have not been mutated yet. Users can track the state of the progress via the systems.mutations table. These are I/O intense operations and should be used sparingly as they can impact clusterA collection of nodes (servers) that work together to store and process data. SELECT
performance.
Read more about delete mutations.
Truncate table
If all data in a table needs to be deleted, use the TRUNCATE TABLE
command shown below. This is a lightweight operation.
Read more about TRUNCATE TABLE.
Drop partition
If you have specified a custom partitioning keyA partitioning key in ClickHouse is a SQL expression defined in the PARTITION BY clause when creating a table. It determines how data is logically grouped into partitions on disk. Each unique value of the partitioning key forms its own physical partition, allowing for efficient data management operations such as dropping, moving, or archiving entire partitions. for your data, partitions can be efficiently dropped. Avoid high cardinality partitioning.
Read more about DROP PARTITION.