Using ClickHouse's azureBlobStorage table function

This is one of the most efficient and straightforward ways to copy data from Azure Blob Storage or Azure Data Lake Storage into ClickHouse. With this table function, you can instruct ClickHouse to connect directly to Azure storage and read data on demand.

It provides a table-like interface that allows you to select, insert, and filter data directly from the source. The function is highly optimized and supports many widely used file formats, including CSV, JSON, Parquet, Arrow, TSV, ORC, Avro, and more. For the full list see "Data formats".

In this section, we'll walk through a simple startup guide for transferring data from Azure Blob Storage to ClickHouse, along with important considerations for using this function effectively. For more details and advanced options, refer to the official documentation: azureBlobStorage Table Function documentation page

Acquiring Azure Blob Storage Access Keys

To allow ClickHouse to access your Azure Blob Storage, you'll need a connection string with an access key.

In the Azure portal, navigate to your Storage Account.
In the left-hand menu, select Access keys under the Security + networking section.
Choose either key1 or key2, and click the Show button next to the Connection string field.
Copy the connection string — you'll use this as a parameter in the azureBlobStorage table function.

Querying the data from Azure Blob Storage

Open your preferred ClickHouse query console — this can be the ClickHouse Cloud web interface, the ClickHouse CLI client, or any other tool you use to run queries. Once you have both the connection string and your ClickHouse query console ready, you can start querying data directly from Azure Blob Storage.

In the following example, we query all the data stored in JSON files located in a container named data-container:

If you'd like to copy that data into a local ClickHouse table (e.g., my_table), you can use an INSERT INTO ... SELECT statement:

This allows you to efficiently pull external data into ClickHouse without needing intermediate ETL steps.

A simple example using the Environmental sensors dataset

As an example we will download a single file from the Environmental Sensors Dataset.

Download a sample file from the Environmental Sensors Dataset
In the Azure Portal, create a new storage account if you don't already have one.

Caution

Make sure that Allow storage account key access is enabled for your storage account, otherwise you will not be able to use the account keys to access the data.

Create a new container in your storage account. In this example, we name it sensors. You can skip this step if you're using an existing container.
Upload the previously downloaded 2019-06_bmp180.csv.zst file to the container.
Follow the steps described earlier to obtain the Azure Blob Storage connection string.

Now that everything is set up, you can query the data directly from Azure Blob Storage:

To load the data into a table, create a simplified version of the schema used in the original dataset:

Info

For more information on configuration options and schema inference when querying external sources like Azure Blob Storage, see Automatic schema inference from input data

Now insert the data from Azure Blob Storage into the sensors table:

Your sensors table is now populated with data from the 2019-06_bmp180.csv.zst file stored in Azure Blob Storage.

Additional resources

This is just a basic introduction to using the azureBlobStorage function. For more advanced options and configuration details, please refer to the official documentation:

Acquiring Azure Blob Storage Access Keys​

Querying the data from Azure Blob Storage​

A simple example using the Environmental sensors dataset​

Additional resources​

Acquiring Azure Blob Storage Access Keys

Querying the data from Azure Blob Storage

A simple example using the Environmental sensors dataset

Additional resources