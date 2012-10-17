On this page

Use an S3 bucket as a ClickHouse disk

This article demonstrates the basics of how to configure an AWS IAM user, create an S3 bucket and configure ClickHouse to use the bucket as an S3 disk. You should work with your security team to determine the permissions to be used, and consider these as a starting point.

In this procedure, we'll be creating a service account user, not a login user.

Log into the AWS IAM Management Console. In "users", select add users

Enter the user name and set the credential type to Access key - Programatic access and select Next: Permissions Do not add the user to any group, select Next: Tags Unless you need to add any tags, select Next: Review Select Create User note The warning message stating that the user has no permissions can be ignored; permissions will be granted on the bucket for the user in the next section The user is now created, click on show and copy the access and secret keys. note Save the keys somewhere else, this is the only time that the secret access key will be available. Click close, then find the user created in the users screen. Copy the ARN (Amazon Resource Name) and save for use when configuring the access policy for the bucket.

In the S3 bucket section, select Create bucket Enter a bucket name, leave other options default note The bucket name must be unique across AWS, not just the organization, or it will emit an error. Leave Block all Public Access enabled, it is not needed. Select Create Bucket at the bottom of the page Select the link and Copy the ARN and save for use when configuring the access policy for the bucket. Once the bucket has been created, find the new S3 bucket in the S3 buckets list and select the link Select Create folder Enter a folder name which will be the target for the ClickHouse S3 disk and select Create folder The folder should now be visible on the bucket list Select the checkbox for the new folder and click on Copy URL . Save the URL copied to be used in the ClickHouse storage configuration in the next section. Select the Permissions tab and click on the Edit button in the Bucket Policy section Add a bucket policy, example below:

{

"Version" : "2012-10-17" ,

"Id" : "Policy123456" ,

"Statement" : [

{

"Sid" : "abc123" ,

"Effect" : "Allow" ,

"Principal" : {

"AWS" : "arn:aws:iam::921234567898:user/mars-s3-user"

} ,

"Action" : "s3:*" ,

"Resource" : [

"arn:aws:s3:::mars-doc-test" ,

"arn:aws:s3:::mars-doc-test/*"

]

}

]

}



Parameter Description Example Value Version Version of the policy interpreter, leave as-is 2012-10-17 Sid User-defined policy id abc123 Effect Whether user requests will be allowed or denied Allow Principal The accounts or user that will be allowed arn:aws:iam::921234567898:user/mars-s3-user Action What operations are allowed on the bucket s3:* Resource Which resources in the bucket will operations be allowed in "arn:aws:s3:::mars-doc-test", "arn:aws:s3:::mars-doc-test/*"

note You should work with your security team to determine the permissions to be used, consider these as a starting point. For more information on Policies and settings, refer to AWS documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html

Save the policy configuration.

The following example is based on a Linux Deb package installed as a service with default ClickHouse directories.

Create a new file in the ClickHouse config.d directory to store the storage configuration.

vim /etc/clickhouse-server/config.d/storage_config.xml



Add the following for storage configuration; substituting the bucket path, access key and secret keys from earlier steps

< clickhouse >

< storage_configuration >

< disks >

< s3_disk >

< type > s3 </ type >

< endpoint > https://mars-doc-test.s3.amazonaws.com/clickhouse3/ </ endpoint >

< access_key_id > ABC123 </ access_key_id >

< secret_access_key > Abc+123 </ secret_access_key >

< metadata_path > /var/lib/clickhouse/disks/s3_disk/ </ metadata_path >

< cache_enabled > true </ cache_enabled >

< data_cache_enabled > true </ data_cache_enabled >

< cache_path > /var/lib/clickhouse/disks/s3_disk/cache/ </ cache_path >

</ s3_disk >

</ disks >

< policies >

< s3_main >

< volumes >

< main >

< disk > s3_disk </ disk >

</ main >

</ volumes >

</ s3_main >

</ policies >

</ storage_configuration >

</ clickhouse >



note The tag <s3_disk> within the <disks> tag is an arbitrary label. This can be set to something else but the same label must be used in the <disk> tab under the <policies> tab to reference the disk. The <metadata_path> and <cache_path> are recommended to also include the name in the path to be able to identify the locations on disk. The <S3_main> tag is also arbitrary and is the name of the policy which will be used as the identifier storage target when creating resources in ClickHouse. For more information about using S3: Integrations Guide: S3 Backed MergeTree

Update the owner of the file to the clickhouse user and group

chown clickhouse:clickhouse /etc/clickhouse-server/config.d/storage_config.xml



Restart the ClickHouse instance to have the changes take effect.

service clickhouse-server restart



Log in with the ClickHouse client, something like the following

clickhouse-client --user default --password ClickHouse123 !



Create a table specifying the new S3 storage policy

chnode4 : ) CREATE TABLE s3_table1

(

` id ` UInt64 ,

` column1 ` String

)

ENGINE = MergeTree

ORDER BY id

SETTINGS storage_policy = 's3_main' ;



CREATE TABLE s3_table1

(

` id ` UInt64 ,

` column1 ` String

)

ENGINE = MergeTree

ORDER BY id

SETTINGS storage_policy = 's3_main'



Query id: fefd97b5 - cce5 - 4 fe3 - a1d6 - 8 cdda5616451



Ok .



0 rows in set . Elapsed: 0.254 sec .



Show that the table was created with the correct policy

chnode4 : ) SHOW CREATE TABLE s3_table1 ;



SHOW CREATE TABLE s3_table1



Query id: e7a00995 - 351 c - 41 cb - a3aa - 272 a5849b134



┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐

│ CREATE TABLE default . s3_table1

(

` id ` UInt64 ,

` column1 ` String

)

ENGINE = MergeTree

ORDER BY id

SETTINGS storage_policy = 's3_main' , index_granularity = 8192 │

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘



1 row in set . Elapsed: 0.004 sec .



Insert test rows into the table

chnode4 : ) INSERT INTO s3_table1

( id , column1 )

VALUES

( 1 , 'abc' ) ,

( 2 , 'xyz' ) ;



INSERT INTO s3_table1 ( id , column1 ) FORMAT Values



Query id: 0265 dd92 - 3890 - 4 d56 - 9 d12 - 71 d4038b85d5



Ok .



2 rows in set . Elapsed: 0.337 sec .



View the rows

chnode4 : ) SELECT * FROM s3_table1 ;



SELECT *

FROM s3_table1



Query id: 967 a8f0c - 3 b67 - 4154 - 830 f - 33 bd6ad386ce



┌─id─┬─column1─┐

│ 1 │ abc │

│ 2 │ xyz │

└────┴─────────┘



2 rows in set . Elapsed: 0.284 sec .



In the AWS console, navigate to the buckets, select the new one and the folder. You should see something like the following:

This article provided simple step-by-step instructions on configuring AWS S3 bucket for access and use as a disk for ClickHouse.