Skip to main content
Skip to main content

Lakehouse runtime catalog (BigLake Metastore)

Beta feature. Learn more.

ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your Iceberg tables in Lakehouse runtime catalog aka BigLake Metastore via ClickHouse.

Note

As this feature is beta, you will need to enable it using: SET allow_database_iceberg = 1;

Prerequisites

Before creating a connection from ClickHouse to Lakehouse runtime catalog (BigLake Metastore), ensure you have:

  • A Google Cloud project with Lakehouse runtime catalog enabled
  • Application Default credentials (Oauth client ID and client secret) for an application, created via Google Cloud Console
  • A refresh token obtained by completing the OAuth flow with the appropriate scopes (e.g. https://www.googleapis.com/auth/bigquery and storage scope for GCS)
  • A warehouse path: a GCS bucket (and optional prefix) where your tables are stored, e.g. gs://your-bucket or gs://your-bucket/prefix

Creating a connection between Lakehouse runtime catalog and ClickHouse

With the OAuth credentials in place, create a database in ClickHouse that uses the DataLakeCatalog database engine:

SET allow_database_iceberg = 1;

CREATE DATABASE lakehouse_runtime_catalog
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS
    catalog_type = 'biglake',
    google_adc_client_id = '<client-id>',
    google_adc_client_secret = '<client-secret>',
    google_adc_refresh_token = '<refresh-token>',
    google_adc_quota_project_id = '<gcp-project-id>',
    warehouse = 'gs://<bucket_name>/<optional-prefix>';

Querying Lakehouse runtime catalog tables using ClickHouse

Once the connection is created, you can query tables registered in the Lakehouse runtime catalog.

USE LAKEHOUSE_RUNTIME_CATALOG;

SHOW TABLES;

Example output:

┌─name──────────────────────────────────────┐
│lakehouse_runtime_catalog.my_iceberg_table │   
└───────────────────────────────────────────┘
SELECT count(*) FROM `lakehouse_runtime_catalog.my_iceberg_table`;
Backticks required

Backticks are required because ClickHouse doesn't support more than one namespace.

To inspect the table definition:

SHOW CREATE TABLE `lakehouse_runtime_catalog.my_iceberg_table`;

Loading data from Lakehouse into ClickHouse

To load data from a Lakehouse runtime catalog table into a local ClickHouse table for faster repeated queries, create a MergeTree table and insert from the catalog:

CREATE TABLE clickhouse_table
(
    `id` Int64,
    `event_time` DateTime64(3),
    `user_id` String,
    `payload` String
)
ENGINE = MergeTree
ORDER BY (event_time, id);

INSERT INTO local_events
SELECT * FROM lakehouse_runtime_catalog.`icebench.my_iceberg_table`;

After the initial load, query clickhouse_table for lower latency. Re-run the INSERT INTO ... SELECT to refresh data from BigLake when needed.