Skip to main content
Skip to main content

Fabric OneLake

Beta feature. Learn more.

ClickHouse supports integration with multiple catalogs (OneLake, Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data stored in Microsoft OneLake using ClickHouse and OneLake.

Microsoft OneLake supports multiple table formats for their lakehouse. With ClickHouse, you can query Iceberg tables.

Note

As this feature is beta, you will need to enable it using: SET allow_database_iceberg = 1;

Gathering Requirements OneLake

Before querying your table in Microsoft Fabric, you'll need to collect the following information:

  • A OneLake tenant ID (Your Entra ID)
  • A client ID
  • A client secret
  • A warehouse ID and a data item ID

See Microsoft OneLake's documentation for help finding these values.

Creating a connection between OneLake and ClickHouse

With the required info above you can now create a connection between Microsoft OneLake and ClickHouse, but before that you need to enable catalogs:

SET allow_database_iceberg=1

Connect to OneLake

CREATE DATABASE onelake_catalog
ENGINE = DataLakeCatalog('https://onelake.table.fabric.microsoft.com/iceberg')
SETTINGS
catalog_type = 'onelake',
warehouse = 'warehouse_id/data_item_id',
onelake_tenant_id = '<tenant_id>',
oauth_server_uri = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token',
auth_scope = 'https://storage.azure.com/.default',
onelake_client_id = '<client_id>',
onelake_client_secret = '<client_secret>'

Querying OneLake using ClickHouse

Now that the connection is in place, you can start querying OneLake:

SHOW TABLES FROM onelake_catalog

Query id: 8f6124c4-45c2-4351-b49a-89dc13e548a7

   ┌─name──────────────────────────┐
1. │ year_2017.green_tripdata_2017 │
2. │ year_2018.green_tripdata_2018 │
3. │ year_2019.green_tripdata_2019 │
4. │ year_2020.green_tripdata_2020 │
5. │ year_2022.green_tripdata_2022 │
   └───────────────────────────────┘

If you're using the Iceberg client, only the Delta tables with Uniform-enabled will be shown:

To query a table:

SELECT *
FROM onelake_catalog.`year_2017.green_tripdata_2017`
LIMIT 1

Query id: db6b4bda-cc58-4ca1-8891-e0d14f02c890

Row 1:
──────
VendorID:              2
lpep_pickup_datetime:  2017-05-18 16:55:43.000000
lpep_dropoff_datetime: 2017-05-18 18:04:11.000000
store_and_fwd_flag:    N
RatecodeID:            2
PULocationID:          130
DOLocationID:          48
passenger_count:       2
trip_distance:         12.43
fare_amount:           52
extra:                 4.5
mta_tax:               0.5
tip_amount:            0
tolls_amount:          33
ehail_fee:             ᴺᵁᴸᴸ
improvement_surcharge: 0.3
total_amount:          90.3
payment_type:          2
trip_type:             1
congestion_surcharge:  ᴺᵁᴸᴸ
source_file:           green_tripdata_2017-05.parquet
Backticks required

Backticks are required because ClickHouse doesn't support more than one namespace.

To inspect the table DDL:

SHOW CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017`

Query id: 8bd5bd8e-83be-453e-9a88-32de12ba7f24

   ┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ CREATE TABLE onelake_catalog.`year_2017.green_tripdata_2017`                                                                                                               ↴│
   │↳(                                                                                                                                                                          ↴│
   │↳    `VendorID` Nullable(Int64),                                                                                                                                            ↴│
   │↳    `lpep_pickup_datetime` Nullable(DateTime64(6, 'UTC')),                                                                                                                 ↴│
   │↳    `lpep_dropoff_datetime` Nullable(DateTime64(6, 'UTC')),                                                                                                                ↴│
   │↳    `store_and_fwd_flag` Nullable(String),                                                                                                                                 ↴│
   │↳    `RatecodeID` Nullable(Int64),                                                                                                                                          ↴│
   │↳    `PULocationID` Nullable(Int64),                                                                                                                                        ↴│
   │↳    `DOLocationID` Nullable(Int64),                                                                                                                                        ↴│
   │↳    `passenger_count` Nullable(Int64),                                                                                                                                     ↴│
   │↳    `trip_distance` Nullable(Float64),                                                                                                                                     ↴│
   │↳    `fare_amount` Nullable(Float64),                                                                                                                                       ↴│
   │↳    `extra` Nullable(Float64),                                                                                                                                             ↴│
   │↳    `mta_tax` Nullable(Float64),                                                                                                                                           ↴│
   │↳    `tip_amount` Nullable(Float64),                                                                                                                                        ↴│
   │↳    `tolls_amount` Nullable(Float64),                                                                                                                                      ↴│
   │↳    `ehail_fee` Nullable(Float64),                                                                                                                                         ↴│
   │↳    `improvement_surcharge` Nullable(Float64),                                                                                                                             ↴│
   │↳    `total_amount` Nullable(Float64),                                                                                                                                      ↴│
   │↳    `payment_type` Nullable(Int64),                                                                                                                                        ↴│
   │↳    `trip_type` Nullable(Int64),                                                                                                                                           ↴│
   │↳    `congestion_surcharge` Nullable(Float64),                                                                                                                              ↴│
   │↳    `source_file` Nullable(String)                                                                                                                                         ↴│
   │↳)                                                                                                                                                                          ↴│
   │↳ENGINE = Iceberg('abfss://<warehouse_id>@onelake.dfs.fabric.microsoft.com/<data_item_id>/Tables/year_2017/green_tripdata_2017') │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Loading data from your Data Lake into ClickHouse

If you need to load data from OneLake into ClickHouse:

CREATE TABLE trips
ENGINE = MergeTree
ORDER BY coalesce(VendorID, 0)
AS SELECT *
FROM onelake_catalog.`year_2017.green_tripdata_2017`

Query id: d15983a6-ef6a-40fe-80d5-19274b9fe328

Ok.

0 rows in set. Elapsed: 32.570 sec. Processed 11.74 million rows, 275.37 MB (360.36 thousand rows/s., 8.45 MB/s.)
Peak memory usage: 1.31 GiB.