Connect Airbyte to ClickHouse

Note Please note that the Airbyte source and destination for ClickHouse are currently in Alpha status and not suitable for moving large datasets (> 10 million rows)

Airbyte is an open-source data integration platform. It allows the creation of ELT data pipelines and is shipped with more than 140 out-of-the-box connectors. This step-by-step tutorial shows how to connect Airbyte to ClickHouse as a destination and load a sample dataset.

Airbyte runs on Docker and uses docker-compose . Make sure to download and install the latest versions of Docker. Deploy Airbyte by cloning the official Github repository and running docker-compose up in your favorite terminal: Once you see the Airbyte banner in your terminal, you can connect to localhost:8000 Note Alternatively, you can signup and use Airbyte Cloud

In this section, we will display how to add a ClickHouse instance as a destination.

Start your ClickHouse server (Airbyte is compatible with ClickHouse version 21.8.10.19 or above) or login to your ClickHouse cloud account: Within Airbyte, select the "Destinations" page and add a new destination: Select ClickHouse from the "Destination type" drop-down list, and Fill out the "Set up the destination" form by providing your ClickHouse hostname and ports, database name, username and password and select if it's an SSL connection (equivalent to the --secure flag in the clickhouse-client ): Congratulations! you have now added ClickHouse as a destination in Airbyte.

Note In order to use ClickHouse as a destination, the user you'll use need to have the permissions to create databases, tables and insert rows. We recommend creating a dedicated user for Airbyte (eg. my_airbyte_user ) with the following permissions:

The example dataset we will use is the New York City Taxi Data (on Github). For this tutorial, we will use a subset of this dataset which corresponds to the month of Jan 2022.

Within Airbyte, select the "Sources" page and add a new source of type file. Fill out the "Set up the source" form by naming the source and providing the URL of the NYC Taxi Jan 2022 file (see below). Make sure to pick parquet as file format, HTTPS Public Web as Storage Provider and nyc_taxi_2022 as Dataset Name. Congratulations! You have now added a source file in Airbyte.