Skip to main content

Getting Started with ClickHouse Cloud

With ClickHouse Cloud, we are building a turnkey hosted ClickHouse experience! This guide will help you get started with creating a new ClickHouse Service, creating a table and inserting data. We have included some important details and helpful tips that you need to be aware of as you start using the ClickHouse Cloud.

While you are waiting for your new ClickHouse service to start, you can click on Members in the left navigation menu and invite others using the Invite Members button. Admin users can access your services and create new ones; users with Developer privileges can access existing services only.

invite users

…so hang on to it! Once your service is provisioned, the password will appear in the dialog window. After you refresh the page in your browser, the password will no longer appear and you will have to reset it if you have not saved it:

save password

  • First, copy your password,
  • then click on the "Connect to SQL console" button and the UI will open in a pop-up window,
  • paste your password into the "password" text field:

play ui

The two commands you can use are provided for you - just copy-and-paste them from the Cloud console:

getting commands

The curl command for the HTTP interface looks like:

curl --user 'default:[password]' \
--data-binary 'SELECT 1' \
https://HOSTNAME.us-west-2.aws.clickhouse.cloud:8443

And the clickhouse-client command for the native interface looks like:

clickhouse-client \
--host HOSTNAME.us-west-2.aws.clickhouse.cloud \
--secure --user default --port 9440 --password [password]

Your ClickHouse Cloud service consists of the default and system databases. All tables and views are created in the default database, and any attempts to create a new database will generate an error. (If you need a new database, simply create a new service.)

Here is a list of the supported table engines. (These table engines support replication, which is a requirement in a cluster environment):

  • ReplicatedMergeTree
  • ReplicatedSummingMergeTree
  • ReplicatedAggregatingMergeTree
  • ReplicatedReplacingMergeTree
  • S3
  • URL
  • View
  • MaterializedView
  • GenerateRandom
  • Join
  • Null

Data can either be inserted using the native interface (e.g. the clickhouse-client) or HTTP interface (e.g. curl or a custom client application). See the Common Tasks section below for an example of both techniques.

Each ClickHouse service runs in a single region across 3 availability zones for high availability. ClickHouse Cloud is currently available in the following AWS regions:

  • Ohio (us-east-2)
  • Oregon (us-west-2)
  • Ireland (eu-west-1)

AWS S3 is supported in ClickHouse Cloud, so you can use the S3 functions and table engine. External functions and engines like ODBC, JDBC, PostgreSQL, MySQL, SQLite, MongoDB, Redis, Kafka, RabbitMQ, HDFS and Hive will work as long as ClickHouse has external access to them from over the Internet.

Some functionality is disabled for security purposes, including:

  • Reading from sources local to clickhouse server, e.g. File table engine
  • SQL functions relying on local filesystem access, e.g. file
  • User-defined functions that rely on executables and external scripts
  • SQL functions for introspection are restricted for security reasons, e.g. addressToLine, tcpPort
  • Most system settings are not configurable at this time and/or do not have any effect

The backups functionality is undergoing additional testing and not enabled at this time

As part of creating the ClickHouse service, we create a default database, and a user named default that has broad permissions to this database. This initial user can create additional users and assign their permissions to this database. Beyond this, the ability to enable the following security features within the database using Kerberos, LDAP, or SSL X.509 certificate authentication are not supported at this time.

Each ClickHouse service is collecting metrics, and you can view those metrics by clicking on the three dots next to your service's name and selecting Details from the menu.

  • We have provided some basic metrics out of the box, but we would love to hear from you if you have ideas and suggetions for metrics to collect. Click the New metric button (below all of the metrics) to share your ideas with us.
  • The ability to query system tables is available, but note that not all metrics exposed in this way are relevant, and we may remove or restrict access to these details in the future.
  • Direct access to ClickHouse server logs is not provided.
  • Audit log of Cloud actions is available on the backend, but not yet exposed in the Cloud console.

CREATE AS SELECT queries are not supported now. Use CREATE VIEW AS + CREATE TABLE AS view + INSERT SELECT. Also, CREATE DATABASE IF NOT EXISITS should not be used to check database availability for now. Use EXISTS instead.


1. Download and install the ClickHouse binary for Linux:
curl https://clickhouse.com/ | sh ./clickhouse install
clickhouse-client --host \
HOSTNAME.aws.us-west-2.clickhouse.cloud \
--secure --user default --port 9440 \
--password [password]
1. Download the clickhouse binary for Mac
curl https://clickhouse.com/ | sh
./clickhouse client \
--host HOSTNAME.aws.us-west-2.clickhouse.cloud \
--secure --user default --password [password]
clickhouse-client \
--host HOSTNAME.aws.us-west-2.clickhouse.cloud \
--secure --user default --port 9440 \
--password [password]
1. Use the clickhouse/clickhouse-server image in Docker Hub to run the clickhouse-client in a Docker container:
docker run -it clickhouse/clickhouse-server \
clickhouse-client --host HOSTNAME.aws.us-west-2.clickhouse.cloud \
--secure --user default --port 9440 --password [password]

The following steps demonstrate how to insert a local text file into a table using clickhouse-client.

1. First create the table, either from the clickhouse-client prompt or using the Play UI. The dataset represents New York City taxi rides:
CREATE TABLE trips (
`trip_id` UInt32,
`vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),
`pickup_date` Date,
`pickup_datetime` DateTime,
`dropoff_date` Date,
`dropoff_datetime` DateTime,
`store_and_fwd_flag` UInt8,
`rate_code_id` UInt8,
`pickup_longitude` Float64,
`pickup_latitude` Float64,
`dropoff_longitude` Float64,
`dropoff_latitude` Float64,
`passenger_count` UInt8,
`trip_distance` Float64,
`fare_amount` Float32,
`extra` Float32,
`mta_tax` Float32,
`tip_amount` Float32,
`tolls_amount` Float32,
`ehail_fee` Float32,
`improvement_surcharge` Float32,
`total_amount` Float32,
`payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
`trip_type` UInt8,
`pickup` FixedString(25),
`dropoff` FixedString(25),
`cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
`pickup_nyct2010_gid` Int8,
`pickup_ctlabel` Float32,
`pickup_borocode` Int8,
`pickup_ct2010` String,
`pickup_boroct2010` String,
`pickup_cdeligibil` String,
`pickup_ntacode` FixedString(4),
`pickup_ntaname` String,
`pickup_puma` UInt16,
`dropoff_nyct2010_gid` UInt8,
`dropoff_ctlabel` Float32,
`dropoff_borocode` UInt8,
`dropoff_ct2010` String,
`dropoff_boroct2010` String,
`dropoff_cdeligibil` String,
`dropoff_ntacode` FixedString(4),
`dropoff_ntaname` String,
`dropoff_puma` UInt16 )
ENGINE = ReplicatedMergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime;
2. If you are at the clickhouse-client prompt, exit it:
clickhouse-cloud :) exit
3. Download the file, which consists of about 1M rows of taxi rides:
curl -O https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_2.gz
4. Run the following command to unzip the file, stream it to the clickhouse-client and insert it into the trips table. Be sure to modify the clickhouse-client portion of the command to match your OS (see the Linux, Mac, Windows and Docker options above):
gzip -d -c trips_2.gz | ./clickhouse-client \
--host HOSTNAME.aws.us-west-2.clickhouse.cloud \
--secure --user default \
--password [password] --database=default \
--query="INSERT INTO trips FORMAT TabSeparatedWithNames"
5. When the insertion is complete, run the following query from either the clickhouse-client or the Play UI to verify you have about a 1M rows in trips (999,825 to be precise):
SELECT count() FROM trips
6. Run a query to see how fast 1M rows can be processed by ClickHouse:
SELECT passenger_count, avg(total_amount) FROM trips GROUP BY passenger_count;
note

More details on the clickhouse-client can be found in the docs.

This example uses the same NYC taxi dataset as the previous example, except the data is inserted using the HTTP interface:

1. Download the file first:
curl -O https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_2.gz
2. Create the trips table defined in the previous section.
3. Unzip the .gz file and pipe the output to curl:
gzip -d -c trips_2.gz | curl -u default:[password] 'https://host:8443/?query=INSERT%20INTO%20trips%20FORMAT%20TabSeparatedWithNames' --data-binary @-
NOTE

The query in the URL is INSERT%20INTO%20trips%20FORMAT%20TabSeparatedWithNames, which is a URL-encoded version of INSERT INTO trips FORMAT TabSeparatedWithNames.

Further details on the http interface can be found here.

In this step, you will see how to populate a table with data from files stored in S3. The example uses the New York Taxi dataset, which is publicly available on S3.

1. Start by creating the table. Notice the table engine is ReplicatedMergeTree, which means it will automatically be replicated across the nodes in your cluster.
CREATE TABLE trips(
`trip_id` UInt32,
`vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),
`pickup_date` Date,
`pickup_datetime` DateTime,
`dropoff_date` Date,
`dropoff_datetime` DateTime,
`store_and_fwd_flag` UInt8,
`rate_code_id` UInt8,
`pickup_longitude` Float64,
`pickup_latitude` Float64,
`dropoff_longitude` Float64,
`dropoff_latitude` Float64,
`passenger_count` UInt8,
`trip_distance` Float64,
`fare_amount` Float32,
`extra` Float32,
`mta_tax` Float32,
`tip_amount` Float32,
`tolls_amount` Float32,
`ehail_fee` Float32,
`improvement_surcharge` Float32,
`total_amount` Float32,
`payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
`trip_type` UInt8,
`pickup` FixedString(25),
`dropoff` FixedString(25),
`cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
`pickup_nyct2010_gid` Int8,
`pickup_ctlabel` Float32,
`pickup_borocode` Int8,
`pickup_ct2010` String,
`pickup_boroct2010` String,
`pickup_cdeligibil` String,
`pickup_ntacode` FixedString(4),
`pickup_ntaname` String,
`pickup_puma` UInt16,
`dropoff_nyct2010_gid` UInt8,
`dropoff_ctlabel` Float32,
`dropoff_borocode` UInt8,
`dropoff_ct2010` String,
`dropoff_boroct2010` String,
`dropoff_cdeligibil` String,
`dropoff_ntacode` FixedString(4),
`dropoff_ntaname` String,
`dropoff_puma` UInt16
)
ENGINE = ReplicatedMergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime;
2. Utilizing the s3 function, insert approximately 1M rows into the table. This will take around 10s to execute:
INSERT INTO trips SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz', 'TabSeparatedWithNames');
3. Confirm the data is available with a simple query to count the average fare per passenger count:
SELECT passenger_count, avg(total_amount) FROM trips GROUP BY passenger_count;

The response should look like:

+---------------+------------------+
|passenger_count|avg(total_amount) |
+---------------+------------------+
|0 |29.16213358481725 |
|1 |15.949619633974189|
|2 |17.156779017363604|
|3 |19.446272162315193|
|4 |17.101371068907223|
|5 |16.366309569286106|
|6 |15.92480646257438 |
|7 |75.80000305175781 |
|8 |10.700000166893005|
|9 |11.800000190734863|
+---------------+------------------+

In the following example we create a table that uses an S3 bucket for its table storage. The table references 9 files of the NYC taxi dataset - so 9m rows in total.

1. Start by creating the table that uses the S3 table engine:
CREATE TABLE trips_raw(
`trip_id` UInt32,
`vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15),
`pickup_date` Date,
`pickup_datetime` DateTime,
`dropoff_date` Date,
`dropoff_datetime` DateTime,
`store_and_fwd_flag` UInt8,
`rate_code_id` UInt8,
`pickup_longitude` Float64,
`pickup_latitude` Float64,
`dropoff_longitude` Float64,
`dropoff_latitude` Float64,
`passenger_count` UInt8,
`trip_distance` Float64,
`fare_amount` Float32,
`extra` Float32,
`mta_tax` Float32,
`tip_amount` Float32,
`tolls_amount` Float32,
`ehail_fee` Float32,
`improvement_surcharge` Float32,
`total_amount` Float32,
`payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4),
`trip_type` UInt8,
`pickup` FixedString(25),
`dropoff` FixedString(25),
`cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3),
`pickup_nyct2010_gid` Int8,
`pickup_ctlabel` Float32,
`pickup_borocode` Int8,
`pickup_ct2010` String,
`pickup_boroct2010` String,
`pickup_cdeligibil` String,
`pickup_ntacode` FixedString(4),
`pickup_ntaname` String,
`pickup_puma` UInt16,
`dropoff_nyct2010_gid` UInt8,
`dropoff_ctlabel` Float32,
`dropoff_borocode` UInt8,
`dropoff_ct2010` String,
`dropoff_boroct2010` String,
`dropoff_cdeligibil` String,
`dropoff_ntacode` FixedString(4),
`dropoff_ntaname` String,
`dropoff_puma` UInt16
)
ENGINE = S3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{1..9}.gz', 'TabSeparatedWithNames', 'gzip');
2. Run the following query to compute the average fare by passenger count. This should take around 15s to execute on most getting started clusters. View the S3 table function page in the docs for further details on the S3 function and table engine.
SELECT passenger_count, avg(total_amount) FROM trips_raw GROUP BY passenger_count;

The response should look like:

+---------------+------------------+
|passenger_count|avg(total_amount) |
+---------------+------------------+
|0 |26.504418398169953|
|1 |16.0056070582809 |
|2 |17.115858410446098|
|3 |17.06865096517502 |
|4 |17.236546043445877|
|5 |16.363680363493543|
|6 |16.01205029060048 |
|7 |64.60190670014848 |
|8 |42.02687507122755 |
|9 |95.6616637772984 |
+---------------+------------------+

The ClickHouse docs has a section on connecting a UI to ClickHouse.

For example, if you want to connect Grafana to ClickHouse, you should be able to follow the steps in the documentation. The Grafana settings for connecting to a ClickHouse Cloud service will look like the following:

  • Name: a Grafana setting - give your data source any name you like
  • Server address: the URL of your ClickHouse Cloud service. This should not include the protocol prefix
  • Server port: 9440
  • Username: default
  • Password: Password provided when you create the ClickHouse Cloud service
  • Default database: leave empty
  • Secure Connection: Enable

In the following examples we create 3 users, each with their own permissions. We assume users have created and inserted data into the trips table using the earlier examples.

  • A user power_user with permissive permissions on the trips table
  • A user read_only with read only permissions on the trips table.
  • A user column_restricted_user with limited read access to columns in the trips table
  • A user row_restricted_user with limited read access to rows in the trips table
note

For all examples, ensure the <password> is modified. Currently no password policies are enforced, leaving the administrator to ensure these follow sensible rules.

1. Create a user, specifying the password. By default this user will have no permissions.
CREATE USER power_user IDENTIFIED WITH plaintext_password BY 'password';
2. Grant the user the following permissions to the trips table. Selectively exclude commands to limit specific permissions for the user.
GRANT SELECT ON default.trips TO power_user WITH GRANT OPTION;
GRANT INSERT ON default.trips TO power_user WITH GRANT OPTION;
GRANT ALTER ON default.trips TO power_user WITH GRANT OPTION;
GRANT TRUNCATE ON default.trips TO power_user WITH GRANT OPTION;
GRANT OPTIMIZE ON default.trips TO power_user WITH GRANT OPTION;
note

The use of WITH GRANT OPTION allows the power_user to GRANT equivalent or lower permissions to other users. Edit as appropriate. In order for the user to grant permissions, they will also need a grantee rights specified

ALTER USER power_user GRANTEES ANY;

ANY here allows them to grant permissions to users and roles. Other options are documented here.

Replicating the clickhouse user

To replicate permissions of the default clickhouse user, issue the following command and execute the statements provided in the reply - replacing the username clickhouse for your target user.

SHOW GRANTS;
1. Creating a read-only user is as simple as a executing a subset of the above commands. The following user has SELECT access only on the trips table.
CREATE USER read_only IDENTIFIED WITH plaintext_password BY 'password';
GRANT SELECT ON default.trips TO read_only;
2. Confirm the user can perform SELECTs on the trips table. The following query can either be executed via the Play UI and modifying the user credentials; or via the clickhouse-client:
clickhouse-client --host <host> --secure --port 9440 \
--user read_only --password <password> \
--query "SELECT count() FROM trips"
1. Create both a user and role.
CREATE USER column_restricted_user IDENTIFIED WITH plaintext_password BY 'password';
CREATE ROLE column_users;
2. Grant permissions to column_users role to see only the columns trip_id, passenger_count and total_amount columns.
GRANT SELECT(trip_id, passenger_count, total_amount) ON default.trips TO column_users;
3. Add the column_restricted_user to the column_users role.
GRANT column_users TO column_restricted_user;
4. Confirm the user can execute queries on the columns to which they have access:
clickhouse-client --host <host> --secure --port 9440 \
--user column_restricted_user --password <password> \
--query "SELECT trip_id, passenger_count, total_amount FROM trips LIMIT 10;"
5. Confirm the following query is not permitted.
clickhouse-client --host <host> --secure --port 9440 \
--user column_restricted_user --password <password> \
--query "SELECT trip_id, passenger_count, total_amount, trip_distance FROM trips LIMIT 10;"
note

Row policies make sense only for users with read-only access. If a user can modify a table or copy partitions between tables, it defeats the restrictions of row policies.

1. Create a both a user and role:
CREATE USER row_restricted_user IDENTIFIED WITH plaintext_password BY 'password';
CREATE ROLE row_users;
GRANT SELECT ON default.trips TO row_restricted_user;
2. Create a policy row_filter to allow only those rows with a passenger_count = 1, assigning this to the row created above:
CREATE ROW POLICY row_filter ON default.trips FOR SELECT USING passenger_count = 1 TO row_users;
caution

Assigning of the row policy will, at this point, forbid all users from seeing rows in the trips table. You must explicitly permit other users to see these rows.

3. Create a row policy to permit all users, except those belonging to the row_users group, the ability to see rows in the trips table:
CREATE ROW POLICY allow_other_users_filter ON default.trips FOR SELECT USING 1 TO ALL EXCEPT row_users;
4. Confirm for the first query a count greater than 0. The second query should result in no hits:
clickhouse-client --host <host> --secure --port 9440 \
--user row_restricted_user --password <password> \
--query "SELECT count() FROM trips WHERE passenger_count = 1"
clickhouse-client --host <host> --secure --port 9440 \
--user row_restricted_user --password <password> \
--query "SELECT count() FROM trips WHERE passenger_count > 1"

Here are some examples on how to delete privileges, policies, unassign users from roles, and delete users and roles:

1. To remove a privilege from a role:
REVOKE SELECT(trip_id, passenger_count, total_amount) ON default.trips FROM column_users;
2. To delete a policy:
DROP ROW POLICY row_filter ON default.trips;
3. To unassign a user from a role:
REVOKE row_users FROM row_restricted_user;
4. To delete a role:
DROP ROLE A_rows_users;
5. To delete a user:
DROP USER row_user;

Here are some helpful commands for troubleshooting users and roles:

1. To show all users:
SHOW USERS;
2. Listing the grants and roles for a user:
SHOW GRANTS FOR power_user;
3. List roles in ClickHouse:
SHOW ROLES;
4. Display the policies:
SHOW ROW POLICIES
5. View how a policy was defined and current privileges:
SHOW CREATE ROW POLICY row_filter ON default.trips;