Use S3 Object Storage as a ClickHouse disk
This article demonstrates the basics of how to configure an AWS IAM user, create an S3 bucket and configure ClickHouse to use the bucket as an S3 disk. You should work with your security team to determine the permissions to be used, and consider these as a starting point.
Create an AWS IAM user
In this procedure, we'll be creating a service account user, not a login user.
Log into the AWS IAM Management Console.
In "users", select Add users
Enter the user name and set the credential type to Access key - Programmatic access and select Next: Permissions
Do not add the user to any group; select Next: Tags
Unless you need to add any tags, select Next: Review
Select Create User
note
The warning message stating that the user has no permissions can be ignored; permissions will be granted on the bucket for the user in the next section
The user is now created; click on show and copy the access and secret keys.
note
Save the keys somewhere else; this is the only time that the secret access key will be available.
Click close, then find the user in the users screen.
Copy the ARN (Amazon Resource Name) and save it for use when configuring the access policy for the bucket.
Create an S3 bucket
In the S3 bucket section, select Create bucket
Enter a bucket name, leave other options default
note
The bucket name must be unique across AWS, not just the organization, or it will emit an error.
Leave
Block all Public Access
enabled; public access is not needed.Select Create Bucket at the bottom of the page
Select the link, copy the ARN, and save it for use when configuring the access policy for the bucket.
Once the bucket has been created, find the new S3 bucket in the S3 buckets list and select the link
Select Create folder
Enter a folder name which will be the target for the ClickHouse S3 disk and select Create folder
The folder should now be visible on the bucket list
Select the checkbox for the new folder and click on Copy URL Save the URL copied to be used in the ClickHouse storage configuration in the next section.
Select the Permissions tab and click on the Edit button in the Bucket Policy section
Add a bucket policy, example below:
{
"Version": "2012-10-17",
"Id": "Policy123456",
"Statement": [
{
"Sid": "abc123",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::921234567898:user/mars-s3-user"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::mars-doc-test",
"arn:aws:s3:::mars-doc-test/*"
]
}
]
}
Parameter | Description | Example Value |
---|---|---|
Version | Version of the policy interpreter, leave as-is | 2012-10-17 |
Sid | User-defined policy id | abc123 |
Effect | Whether user requests will be allowed or denied | Allow |
Principal | The accounts or user that will be allowed | arn:aws:iam::921234567898:user/mars-s3-user |
Action | What operations are allowed on the bucket | s3:* |
Resource | Which resources in the bucket will operations be allowed in | "arn:aws:s3:::mars-doc-test", "arn:aws:s3:::mars-doc-test/*" |
note
You should work with your security team to determine the permissions to be used, consider these as a starting point. For more information on Policies and settings, refer to AWS documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html
- Save the policy configuration.
Configure ClickHouse to use the S3 bucket as a disk
The following example is based on a Linux Deb package installed as a service with default ClickHouse directories.
- Create a new file in the ClickHouse
config.d
directory to store the storage configuration.
vim /etc/clickhouse-server/config.d/storage_config.xml
- Add the following for storage configuration; substituting the bucket path, access key and secret keys from earlier steps
<clickhouse>
<storage_configuration>
<disks>
<s3_disk>
<type>s3</type>
<endpoint>https://mars-doc-test.s3.amazonaws.com/clickhouse3/</endpoint>
<access_key_id>ABC123</access_key_id>
<secret_access_key>Abc+123</secret_access_key>
<metadata_path>/var/lib/clickhouse/disks/s3_disk/</metadata_path>
<cache_enabled>true</cache_enabled>
<data_cache_enabled>true</data_cache_enabled>
<cache_path>/var/lib/clickhouse/disks/s3_disk/cache/</cache_path>
</s3_disk>
</disks>
<policies>
<s3_main>
<volumes>
<main>
<disk>s3_disk</disk>
</main>
</volumes>
</s3_main>
</policies>
</storage_configuration>
</clickhouse>
note
The tag <s3_disk>
within the <disks>
tag is an arbitrary label. This can be set to something else but the same label must be used in the <disk>
tab under the <policies>
tab to reference the disk.
The <metadata_path>
and <cache_path>
are recommended to also include the name in the path to be able to identify the locations on disk.
The <S3_main>
tag is also arbitrary and is the name of the policy which will be used as the identifier storage target when creating resources in ClickHouse.
For more information about using S3: Integrations Guide: S3 Backed MergeTree
- Update the owner of the file to the clickhouse user and group
chown clickhouse:clickhouse /etc/clickhouse-server/config.d/storage_config.xml
- Restart the ClickHouse instance to have the changes take effect.
service clickhouse-server restart
Testing
- Log in with the ClickHouse client, something like the following
clickhouse-client --user default --password ClickHouse123!
- Create a table specifying the new S3 storage policy
chnode4 :) CREATE TABLE s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main';
CREATE TABLE s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main'
Query id: fefd97b5-cce5-4fe3-a1d6-8cdda5616451
Ok.
0 rows in set. Elapsed: 0.254 sec.
- Show that the table was created with the correct policy
chnode4 :) SHOW CREATE TABLE s3_table1;
SHOW CREATE TABLE s3_table1
Query id: e7a00995-351c-41cb-a3aa-272a5849b134
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.s3_table1
(
`id` UInt64,
`column1` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS storage_policy = 's3_main', index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.004 sec.
- Insert test rows into the table
chnode4 :) INSERT INTO s3_table1
(id, column1)
VALUES
(1, 'abc'),
(2, 'xyz');
INSERT INTO s3_table1 (id, column1) FORMAT Values
Query id: 0265dd92-3890-4d56-9d12-71d4038b85d5
Ok.
2 rows in set. Elapsed: 0.337 sec.
- View the rows
chnode4 :) SELECT * FROM s3_table1;
SELECT *
FROM s3_table1
Query id: 967a8f0c-3b67-4154-830f-33bd6ad386ce
┌─id─┬─column1─┐
│ 1 │ abc │
│ 2 │ xyz │
└────┴─────────┘
2 rows in set. Elapsed: 0.284 sec.
In the AWS console, navigate to the buckets, select the new one and the folder. You should see something like the following:
Summary
This article provided simple step-by-step instructions on configuring AWS S3 bucket for access and use as a disk for ClickHouse.