Skip to main content

Use S3 Object Storage as a ClickHouse disk

This article demonstrates the basics of how to configure an AWS IAM user, create an S3 bucket and configure ClickHouse to use the bucket as an S3 disk. You should work with your security team to determine the permissions to be used, and consider these as a starting point.

Create an AWS IAM user

In this procedure, we'll be creating a service account user, not a login user.

  1. Log into the AWS IAM Management Console.

  2. In "users", select Add users

    create_iam_user_0

  3. Enter the user name and set the credential type to Access key - Programmatic access and select Next: Permissions

    create_iam_user_1

  4. Do not add the user to any group; select Next: Tags

    create_iam_user_2

  5. Unless you need to add any tags, select Next: Review

    create_iam_user_3

  6. Select Create User

    note

    The warning message stating that the user has no permissions can be ignored; permissions will be granted on the bucket for the user in the next section

    create_iam_user_4

  7. The user is now created; click on show and copy the access and secret keys.

    note

    Save the keys somewhere else; this is the only time that the secret access key will be available.

    create_iam_user_5

  8. Click close, then find the user in the users screen.

    create_iam_user_6

  9. Copy the ARN (Amazon Resource Name) and save it for use when configuring the access policy for the bucket.

    create_iam_user_7

Create an S3 bucket

  1. In the S3 bucket section, select Create bucket

    create_s3_bucket_0

  2. Enter a bucket name, leave other options default

    note

    The bucket name must be unique across AWS, not just the organization, or it will emit an error.

  3. Leave Block all Public Access enabled; public access is not needed.

    create_s3_bucket_2

  4. Select Create Bucket at the bottom of the page

    create_s3_bucket_3

  5. Select the link, copy the ARN, and save it for use when configuring the access policy for the bucket.

  6. Once the bucket has been created, find the new S3 bucket in the S3 buckets list and select the link

    create_s3_bucket_4

  7. Select Create folder

    create_s3_bucket_5

  8. Enter a folder name which will be the target for the ClickHouse S3 disk and select Create folder

    create_s3_bucket_6

  9. The folder should now be visible on the bucket list

    create_s3_bucket_7

  10. Select the checkbox for the new folder and click on Copy URL Save the URL copied to be used in the ClickHouse storage configuration in the next section.

    create_s3_bucket_8

  11. Select the Permissions tab and click on the Edit button in the Bucket Policy section

    create_s3_bucket_9

  12. Add a bucket policy, example below:

{
"Version": "2012-10-17",
"Id": "Policy123456",
"Statement": [
{
"Sid": "abc123",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::921234567898:user/mars-s3-user"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::mars-doc-test",
"arn:aws:s3:::mars-doc-test/*"
]
}
]
}
ParameterDescriptionExample Value
VersionVersion of the policy interpreter, leave as-is2012-10-17
SidUser-defined policy idabc123
EffectWhether user requests will be allowed or deniedAllow
PrincipalThe accounts or user that will be allowedarn:aws:iam::921234567898:user/mars-s3-user
ActionWhat operations are allowed on the buckets3:*
ResourceWhich resources in the bucket will operations be allowed in"arn:aws:s3:::mars-doc-test", "arn:aws:s3:::mars-doc-test/*"
note

You should work with your security team to determine the permissions to be used, consider these as a starting point. For more information on Policies and settings, refer to AWS documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html

  1. Save the policy configuration.

Configure ClickHouse to use the S3 bucket as a disk

The following example is based on a Linux Deb package installed as a service with default ClickHouse directories.

  1. Create a new file in the ClickHouse config.d directory to store the storage configuration.
vim /etc/clickhouse-server/config.d/storage_config.xml
  1. Add the following for storage configuration; substituting the bucket path, access key and secret keys from earlier steps
<clickhouse>
<storage_configuration>
<disks>
<s3_disk>
<type>s3</type>
<endpoint>https://mars-doc-test.s3.amazonaws.com/clickhouse3/</endpoint>
<access_key_id>ABC123</access_key_id>
<secret_access_key>Abc+123</secret_access_key>
<metadata_path>/var/lib/clickhouse/disks/s3_disk/</metadata_path>
<cache_enabled>true</cache_enabled>
<data_cache_enabled>true</data_cache_enabled>
<cache_path>/var/lib/clickhouse/disks/s3_disk/cache/</cache_path>
</s3_disk>
</disks>
<policies>
<s3_main>
<volumes>
<main>
<disk>s3_disk</disk>
</main>
</volumes>
</s3_main>
</policies>
</storage_configuration>
</clickhouse>
note

The tag <s3_disk> within the <disks> tag is an arbitrary label. This can be set to something else but the same label must be used in the <disk> tab under the <policies> tab to reference the disk. The <metadata_path> and <cache_path> are recommended to also include the name in the path to be able to identify the locations on disk. The <S3_main> tag is also arbitrary and is the name of the policy which will be used as the identifier storage target when creating resources in ClickHouse.

For more information about using S3: Integrations Guide: S3 Backed MergeTree

  1. Update the owner of the file to the clickhouse user and group
chown clickhouse:clickhouse /etc/clickhouse-server/config.d/storage_config.xml
  1. Restart the ClickHouse instance to have the changes take effect.
service clickhouse-server restart

Testing

  1. Log in with the ClickHouse client, something like the following
clickhouse-client --user default --password ClickHouse123!
  1. Create a table specifying the new S3 storage policy
chnode4 :) CREATE TABLE s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main';

CREATE TABLE s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main'

Query id: fefd97b5-cce5-4fe3-a1d6-8cdda5616451

Ok.

0 rows in set. Elapsed: 0.254 sec.
  1. Show that the table was created with the correct policy
chnode4 :) SHOW CREATE TABLE s3_table1;

SHOW CREATE TABLE s3_table1

Query id: e7a00995-351c-41cb-a3aa-272a5849b134

┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
CREATE TABLE default.s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main', index_granularity = 8192
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.004 sec.
  1. Insert test rows into the table
chnode4 :) INSERT INTO s3_table1
(id, column1)
VALUES
(1, 'abc'),
(2, 'xyz');

INSERT INTO s3_table1 (id, column1) FORMAT Values

Query id: 0265dd92-3890-4d56-9d12-71d4038b85d5

Ok.

2 rows in set. Elapsed: 0.337 sec.
  1. View the rows
chnode4 :) SELECT * FROM s3_table1;

SELECT *
FROM s3_table1

Query id: 967a8f0c-3b67-4154-830f-33bd6ad386ce

┌─id─┬─column1─┐
1 │ abc │
2 │ xyz │
└────┴─────────┘

2 rows in set. Elapsed: 0.284 sec.
  1. In the AWS console, navigate to the buckets, select the new one and the folder. You should see something like the following:

    create_s3_bucket_10

Summary

This article provided simple step-by-step instructions on configuring AWS S3 bucket for access and use as a disk for ClickHouse.