Migrating from Snowflake to ClickHouse

This guide shows how to migrate data from Snowflake to ClickHouse.

Migrating data between Snowflake and ClickHouse requires the use of an object store, such as S3, as an intermediate storage for transfer. The migration process also relies on using the commands COPY INTO from Snowflake and INSERT INTO SELECT of ClickHouse.

1. Exporting data from Snowflake

Exporting data from Snowflake requires the use of an external stage, as shown in the diagram above.

Let's say we want to export a Snowflake table with the following schema:

To move this table's data to a ClickHouse database, we first need to copy this data to an external stage. When copying data, we recommend Parquet as the intermittent format as it allows type information to be shared, preserves precision, compresses well, and natively supports nested structures common in analytics.

In the example below, we create a named file format in Snowflake to represent Parquet and the desired file options. We then specify which bucket will contain our copied dataset. Finally, we copy the dataset to the bucket.

For a dataset around 5TB of data with a maximum file size of 150MB, and using a 2X-Large Snowflake warehouse located in the same AWS us-east-1 region, copying data to the S3 bucket will take around 30 minutes.

2. Importing to ClickHouse

Once the data is staged in intermediary object storage, ClickHouse functions such as the s3 table function can be used to insert the data into a table, as shown below.

This example uses the s3 table function for AWS S3, but the gcs table function can be used for Google Cloud Storage and the azureBlobStorage table function can be used for Azure Blob Storage.

Assuming the following table target schema:

We can then use the INSERT INTO SELECT command to insert the data from S3 into a ClickHouse table:

Note on nested column structures

The VARIANT and OBJECT columns in the original Snowflake table schema will be output as JSON strings by default, forcing us to cast these when inserting them into ClickHouse.

Nested structures such as some_file are converted to JSON strings on copy by Snowflake. Importing this data requires us to transform these structures to Tuples at insert time in ClickHouse, using the JSONExtract function as shown above.

3. Testing successful data export

To test whether your data was properly inserted, simply run a SELECT query on your new table:

1. Exporting data from Snowflake​

2. Importing to ClickHouse​

3. Testing successful data export​

1. Exporting data from Snowflake

2. Importing to ClickHouse

3. Testing successful data export