Iceberg table engine

Caution

We recommend using the Iceberg Table Function for working with Iceberg data in ClickHouse. The Iceberg Table Function currently provides sufficient functionality, offering a partial read-only interface for Iceberg tables.

The Iceberg Table Engine is available but may have limitations. ClickHouse wasn't originally designed to support tables with externally changing schemas, which can affect the functionality of the Iceberg Table Engine. As a result, some features that work with regular tables may be unavailable or may not function correctly, especially when using the old analyzer.

For optimal compatibility, we suggest using the Iceberg Table Function while we continue to improve support for the Iceberg Table Engine.

This engine provides a read-only integration with existing Apache Iceberg tables in Amazon S3, Azure, HDFS and locally stored tables.

Create table

Note that the Iceberg table must already exist in the storage, this command does not take DDL parameters to create a new table.

Engine arguments

Description of the arguments coincides with description of arguments in engines S3, AzureBlobStorage, HDFS and File correspondingly. format stands for the format of data files in the Iceberg table.

Engine parameters can be specified using Named Collections

Example

Using named collections:

Aliases

Table engine Iceberg is an alias to IcebergS3 now.

Schema evolution

At the moment, with the help of CH, you can read iceberg tables, the schema of which has changed over time. We currently support reading tables where columns have been added and removed, and their order has changed. You can also change a column where a value is required to one where NULL is allowed. Additionally, we support permitted type casting for simple types, namely:

int -> long
float -> double
decimal(P, S) -> decimal(P', S) where P' > P.

Currently, it is not possible to change nested structures or the types of elements within arrays and maps.

To read a table where the schema has changed after its creation with dynamic schema inference, set allow_dynamic_metadata_for_data_lakes = true when creating the table.

Partition pruning

ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. To enable partition pruning, set use_iceberg_partition_pruning = 1. For more information about iceberg partition pruning address https://iceberg.apache.org/spec/#partitioning

Time travel

ClickHouse supports time travel for Iceberg tables, allowing you to query historical data with a specific timestamp or snapshot ID.

Basic usage

Note: You cannot specify both iceberg_timestamp_ms and iceberg_snapshot_id parameters in the same query.

Important considerations

Snapshots are typically created when:
- New data is written to the table
- Some kind of data compaction is performed
Schema changes typically don't create snapshots - This leads to important behaviors when using time travel with tables that have undergone schema evolution.

Example scenarios

All scenarios are written in Spark because CH doesn't support writing to Iceberg tables yet.

Scenario 1: Schema changes without new snapshots

Consider this sequence of operations:

Query results at different timestamps:

At ts1 & ts2: Only the original two columns appear
At ts3: All three columns appear, with NULL for the price of the first row

Scenario 2: Historical vs. current schema differences

A time travel query at a current moment might show a different schema than the current table:

This happens because ALTER TABLE doesn't create a new snapshot but for the current table Spark takes value of schema_id from the latest metadata file, not a snapshot.

Scenario 3: Historical vs. current schema differences

The second one is that while doing time travel you can't get state of table before any data was written to it:

In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.

Metadata file resolution

When using the Iceberg table engine in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:

Candidates search

Direct Path Specification:

If you set iceberg_metadata_file_path, the system will use this exact path by combining it with the Iceberg table directory path.
When this setting is provided, all other resolution settings are ignored.

Table UUID Matching:

If iceberg_metadata_table_uuid is specified, the system will:
- Look only at .metadata.json files in the metadata directory
- Filter for files containing a table-uuid field matching your specified UUID (case-insensitive)

Default Search:

If neither of the above settings are provided, all .metadata.json files in the metadata directory become candidates

Selecting the most recent file

After identifying candidate files using the above rules, the system determines which one is the most recent:

If iceberg_recent_metadata_file_by_last_updated_ms_field is enabled:
- The file with the largest last-updated-ms value is selected
Otherwise:
- The file with the highest version number is selected
- (Version appears as V in filenames formatted as V.metadata.json or V-uuid.metadata.json)

Note: All mentioned settings are engine-level settings and must be specified during table creation as shown below:

Note: While Iceberg Catalogs typically handle metadata resolution, the Iceberg table engine in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.

Data cache

Iceberg table engine and table function support data caching same as S3, AzureBlobStorage, HDFS storages. See here.

Metadata cache

Iceberg table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting use_iceberg_metadata_files_cache, which is enabled by default.

Create table​

Engine arguments​

Example​

Aliases​

Schema evolution​

Partition pruning​

Time travel​

Basic usage​

Important considerations​

Example scenarios​

Scenario 1: Schema changes without new snapshots​

Scenario 2: Historical vs. current schema differences​

Scenario 3: Historical vs. current schema differences​

Metadata file resolution​

Candidates search​

Selecting the most recent file​

Data cache​

Metadata cache​

See also​