Skip to main content

How can I do partitioned writes by year and month on S3?

Learn how to write partitioned data by year and month to an S3 bucket in ClickHouse, using a custom path structure for organizing the data.

Learn how to do partitioned writes by year and month on S3

I want to export data segregating the path in S3 bucket to follow a structure like:

  • 2022
    • 1
    • 2
    • ...
    • 12
  • 2021
    • 1
    • 2
    • ...
    • 12

and so on ...

Answer

Considering the ClickHouse table:

CREATE TABLE sample_data (
    `name` String,
    `age` Int,
    `time` DateTime
) ENGINE = MergeTree
ORDER BY
    name

Add 10000 entries:

INSERT INTO
    sample_data
SELECT
    *
FROM
    generateRandom(
        'name String, age Int, time DateTime',
        10,
        10,
        10
    )
LIMIT
    10000;

Run this to create the desired structure in s3 bucket my_bucket (note this example writes files in parquet format):

INSERT INTO
    FUNCTION s3(
        'https://s3-host:4321/my_bucket/{_partition_id}/file.parquet.gz',
        's3-access-key',
        's3-secret-access-key',
        Parquet,
        'name String, age Int, time DateTime'
    ) PARTITION BY concat(
        formatDateTime(time, '%Y'),
        '/',
        formatDateTime(time, '%m')
    )
SELECT
    name,
    age,
    time
FROM
    sample_data
Query id: 55adcf22-f6af-491e-b697-d09694bbcc56

Ok.

0 rows in set. Elapsed: 15.579 sec. Processed 10.00 thousand rows, 219.93 KB (641.87 rows/s., 14.12 KB/s.)
· One min read