Using ClickHouse's azureBlobStorage table function
This is one of the most efficient and straightforward ways to copy data from Azure Blob Storage or Azure Data Lake Storage into ClickHouse. With this table function, you can instruct ClickHouse to connect directly to Azure storage and read data on demand.
It provides a table-like interface that allows you to select, insert, and
filter data directly from the source. The function is highly optimized and
supports many widely used file formats, including CSV
, JSON
, Parquet
, Arrow
,
TSV
, ORC
, Avro
, and more. For the full list see "Data formats".
In this section, we'll walk through a simple startup guide for transferring
data from Azure Blob Storage to ClickHouse, along with important considerations
for using this function effectively. For more details and advanced options,
refer to the official documentation:
azureBlobStorage
Table Function documentation page
Acquiring Azure Blob Storage Access Keys
To allow ClickHouse to access your Azure Blob Storage, you'll need a connection string with an access key.
-
In the Azure portal, navigate to your Storage Account.
-
In the left-hand menu, select Access keys under the Security + networking section.
-
Choose either key1 or key2, and click the Show button next to the Connection string field.
-
Copy the connection string — you'll use this as a parameter in the azureBlobStorage table function.
Querying the data from Azure Blob Storage
Open your preferred ClickHouse query console — this can be the ClickHouse Cloud web interface, the ClickHouse CLI client, or any other tool you use to run queries. Once you have both the connection string and your ClickHouse query console ready, you can start querying data directly from Azure Blob Storage.
In the following example, we query all the data stored in JSON files located in a container named data-container:
If you'd like to copy that data into a local ClickHouse table (e.g., my_table),
you can use an INSERT INTO ... SELECT
statement:
This allows you to efficiently pull external data into ClickHouse without needing intermediate ETL steps.
A simple example using the Environmental Sensors Dataset
As an example we will download a single file from the Environmental Sensors Dataset.
-
Download a sample file from the Environmental Sensors Dataset
-
In the Azure Portal, create a new storage account if you don't already have one.
Make sure that Allow storage account key access is enabled for your storage account, otherwise you will not be able to use the account keys to access the data.
-
Create a new container in your storage account. In this example, we name it sensors. You can skip this step if you're using an existing container.
-
Upload the previously downloaded
2019-06_bmp180.csv.zst
file to the container. -
Follow the steps described earlier to obtain the Azure Blob Storage connection string.
Now that everything is set up, you can query the data directly from Azure Blob Storage:
- To load the data into a table, create a simplified version of the
schema used in the original dataset:
For more information on configuration options and schema inference when querying external sources like Azure Blob Storage, see Automatic schema inference from input data
- Now insert the data from Azure Blob Storage into the sensors table:
Your sensors table is now populated with data from the 2019-06_bmp180.csv.zst
file stored in Azure Blob Storage.
Additional Resources
This is just a basic introduction to using the azureBlobStorage function. For more advanced options and configuration details, please refer to the official documentation: