Skip to main content
Skip to main content

Connecting to a data catalog

In the previous section, you queried open table formats by passing storage paths directly. In practice, most organizations manage table metadata through a data catalog - a central registry that tracks table locations, schemas, and partitions. When you connect ClickHouse to a catalog using the DataLakeCatalog database engine, the entire catalog is exposed as a ClickHouse database. Every table in the catalog appears automatically and can be queried with full ClickHouse SQL - no need to know individual table paths or manage credentials per table.

This guide walks through connecting to Databricks Unity Catalog. ClickHouse also supports the following catalogs - refer to each reference guide for full setup instructions:

CatalogReference guide
AWS GlueAWS Glue catalog
Iceberg REST CatalogREST catalog
LakekeeperLakekeeper catalog
Project NessieNessie catalog
Microsoft OneLakeFabric OneLake

Connecting to a Unity Catalog

Beta feature. Learn more.

For example purposes, we'll use the Unity catalog.

Databricks Unity Catalog provides centralized governance for Databricks lakehouse data.

Databricks supports multiple data formats for their lakehouse. With ClickHouse, you can query Unity Catalog tables as both Delta and Iceberg.

Note

Integration with the Unity Catalog works for managed and external tables. This integration is currently only supported on AWS.

Configuring Unity in Databricks

To allow ClickHouse to interact with Unity catalog, you need to make sure your Unity Catalog is configured to allow interaction with an external reader. This can be achieved by following the "Enable external data access to Unity Catalog" guide.

In addition to enabling external access, ensure the principal configuring the integration has the EXTERNAL USE SCHEMA privilege on the schema containing the tables.

Once your catalog is configured, you must generate credentials for ClickHouse. Two different methods can be used, depending on your interaction mode with Unity:

  • For Iceberg clients, authenticate with a service principal.

  • For Delta clients, use a Personal Access Token (PAT).

Connect to the catalog

With the credentials, you can connect to the relevant endpoint to query the Iceberg or Delta tables.

The Unity catalog should be used for accessing data in the Delta format.

SET allow_experimental_database_unity_catalog = 1;

CREATE DATABASE unity
ENGINE = DataLakeCatalog('https://<workspace-id>.cloud.databricks.com/api/2.1/unity-catalog')
SETTINGS warehouse = 'CATALOG_NAME', catalog_credential = '<PAT>', catalog_type = 'unity';

List tables

Once the connection has been established to your catalog, you can list the tables.

SHOW TABLES FROM unity

┌─name───────────────────────────────────────────────┐
│ unity.logs                                         │
│ unity.single_day_log                               │
└────────────────────────────────────────────────────┘

31 rows in set.

Exploring table schemas

We can use the standard SHOW CREATE TABLE command to see how the tables were created.

Backticks required

Note the need to specify the namespace and the table name, surrounded with backticks - ClickHouse doesn't support more than one namespace.

The following assumes querying the REST iceberg catalog:

SHOW CREATE TABLE unity.`icebench.single_day_log`

CREATE TABLE unity.`icebench.single_day_log`
(
    `pull_request_number` Nullable(Int64),
    `commit_sha` Nullable(String),
    `check_start_time` Nullable(DateTime64(6, 'UTC')),
    `check_name` Nullable(String),
    `instance_type` Nullable(String),
    `instance_id` Nullable(String),
    `event_date` Nullable(Date32),
    `event_time` Nullable(DateTime64(6, 'UTC')),
    `event_time_microseconds` Nullable(DateTime64(6, 'UTC')),
    `thread_name` Nullable(String),
    `thread_id` Nullable(Decimal(20, 0)),
    `level` Nullable(String),
    `query_id` Nullable(String),
    `logger_name` Nullable(String),
    `message` Nullable(String),
    `revision` Nullable(Int64),
    `source_file` Nullable(String),
    `source_line` Nullable(Decimal(20, 0)),
    `message_format_string` Nullable(String)
)
ENGINE = Iceberg('s3://...')

Querying a table

All ClickHouse functions are supported. Again, the namespace and table name should be delimited with backticks.


SELECT count()
FROM unity.`icebench.single_day_log`

┌───count()─┐
│ 282634391 │ -- 282.63 million
└───────────┘

1 row in set. Elapsed: 1.265 sec.

For full setup instructions, see the Unity catalog reference guide.