DataLakeCatalog

The DataLakeCatalog database engine enables you to connect ClickHouse to external data catalogs and query open table format data without the need for data duplication. This transforms ClickHouse into a powerful query engine that works seamlessly with your existing data lake infrastructure.

Supported Catalogs

The DataLakeCatalog engine supports the following data catalogs:

  • AWS Glue Catalog - For Iceberg tables in AWS environments
  • Databricks Unity Catalog - For Delta Lake and Iceberg tables
  • Hive Metastore - Traditional Hadoop ecosystem catalog
  • REST Catalogs - Any catalog supporting the Iceberg REST specification

Creating a Database

You will need to enable the relevant settings below to use the DataLakeCatalog engine:

Databases with the DataLakeCatalog engine can be created using the following syntax:

The following settings are supported:

SettingDescription
catalog_typeType of catalog: glue, unity (Delta), rest (Iceberg), hive
warehouseThe warehouse/database name to use in the catalog.
catalog_credentialAuthentication credential for the catalog (e.g., API key or token)
auth_headerCustom HTTP header for authentication with the catalog service
auth_scopeOAuth2 scope for authentication (if using OAuth)
storage_endpointEndpoint URL for the underlying storage
oauth_server_uriURI of the OAuth2 authorization server for authentication
vended_credentialsBoolean indicating whether to use vended credentials (AWS-specific)
aws_access_key_idAWS access key ID for S3/Glue access (if not using vended credentials)
aws_secret_access_keyAWS secret access key for S3/Glue access (if not using vended credentials)
regionAWS region for the service (e.g., us-east-1)

Examples

See below pages for examples of using the DataLakeCatalog engine: