Getting started with lakehouse table formats

This guide provides a hands-on walkthrough of the core capabilities ClickHouse offers for working with lakehouse table formats.

Querying data in place

ClickHouse can act as a query engine over open table formats stored in object storage. Without duplicating data, users can point ClickHouse at existing Iceberg, Delta Lake, Hudi, or Paimon tables and begin querying immediately, whether to power a production workload or to explore data interactively. This can be done through direct reads using table functions and table engines, or by connecting to a data catalog.

Querying open table formats directly — Use ClickHouse table functions to read Iceberg, Delta Lake, Hudi, and Paimon tables in object storage without any prior setup.
Connecting to a data catalog — Expose a catalog as a ClickHouse database and query its tables using standard SQL. Recommended for when you have to access multiple tables in the catalog.

Accelerating analytics

For workloads that demand low-latency responses and high concurrency, loading data from open table formats into ClickHouse's MergeTree engine provides dramatically better performance. It's use of a sparse primary index, skip indices, and columnar storage allow queries that take seconds over Parquet files to complete in milliseconds.

Accelerating analytics with MergeTree - Load data from a catalog into a MergeTree table and achieve ~40x query speedups.

Writing data back

Data can also flow from ClickHouse back into open table formats. Whether offloading aged data to long-term storage or publishing the results of transformations for downstream consumption, ClickHouse can write to Iceberg and Delta tables tables in object storage.

Writing data to open table formats - Write raw data and aggregated results from ClickHouse into Iceberg tables using INSERT INTO SELECT.

Querying data in place​

Accelerating analytics​

Writing data back​

Querying data in place

Accelerating analytics

Writing data back