Skip to main content
Edit this page

Table parts

What are table parts in ClickHouse?


The data from each table in the ClickHouse MergeTree engine family is organized on disk as a collection of immutable data parts.

To illustrate this, we use this table (adapted from the UK property prices dataset) tracking the date, town, street, and price for sold properties in the United Kingdom:

CREATE TABLE uk_price_paid
(
date Date,
town LowCardinality(String),
street LowCardinality(String),
price UInt32
)
ENGINE = MergeTree
ORDER BY (town, street);

A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:

INSERT PROCESSING

When a ClickHouse server processes the example insert with 4 rows (e.g., via an INSERT INTO statement) sketched in the diagram above, it performs several steps:

Sorting: The rows are sorted by the table’s sorting key (town, street), and a sparse primary index is generated for the sorted rows.

Splitting: The sorted data is split into columns.

Compression: Each column is compressed.

Writing to Disk: The compressed columns are saved as binary column files within a new directory representing the insert’s data part. The sparse primary index is also compressed and stored in the same directory.

Depending on the table’s specific engine, additional transformations may take place alongside sorting.

Data parts are self-contained, including all metadata needed to interpret their contents without requiring a central catalog. Beyond the sparse primary index, parts contain additional metadata, such as secondary data skipping indexes, column statistics, checksums, min-max indexes (if partitioning is used), and more.

To manage the number of parts per table, a background merge job periodically combines smaller parts into larger ones until they reach a configurable compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a configurable time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:

PART MERGES

To minimize the number of initial parts and the overhead of merges, database clients are encouraged to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the asynchronous insert mode, in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.