Skip to main content

Configuring ClickHouse Keeper

ClickHouse Keeper is a component included in ClickHouse to handle replication and coordinated operations across nodes and clusters. This part of the system replaces the requirement of having a separate Zookeper installation and is compatible with Zookeper for ClickHouse operations.

This guide provides simple and minimal settings to configure ClicKHouse Keeper with an example on how to test distributed operations. This example is performed using 3 nodes on Linux.

1. Configure Nodes with Keeper settings​

  1. Install 3 ClickHouse instances on 3 hosts (chnode1, chnode2, chnode3). (View the Quick Start for details on installing ClickHouse.)

  2. On each node, add the following entry to allow external communication through the network interface.

    <listen_host>0.0.0.0</listen_host>
  3. Add the following ClickHouse Keeper configuration to all three servers updating the <server_id> setting for each server; for chnode1 would be 1, chnode2 would be 2, etc.

    <keeper_server>
    <tcp_port>9181</tcp_port>
    <server_id>1</server_id>
    <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
    <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

    <coordination_settings>
    <operation_timeout_ms>10000</operation_timeout_ms>
    <session_timeout_ms>30000</session_timeout_ms>
    <raft_logs_level>warning</raft_logs_level>
    </coordination_settings>

    <raft_configuration>
    <server>
    <id>1</id>
    <hostname>chnode1.domain.com</hostname>
    <port>9444</port>
    </server>
    <server>
    <id>2</id>
    <hostname>chnode2.domain.com</hostname>
    <port>9444</port>
    </server>
    <server>
    <id>3</id>
    <hostname>chnode3.domain.com</hostname>
    <port>9444</port>
    </server>
    </raft_configuration>
    </keeper_server>

    These are the basic settings used above:

    ParameterDescriptionExample
    tcp_portport to be used by clients of keeper9181 default equivalent of 2181 as in zookeeper
    server_ididentifier for each CLickHouse Keeper server used in raft configuration1
    coordination_settingssection to parameters such as timeoutstimeouts: 10000, log level: trace
    serverdefinition of server participatinglist of each server definition
    raft_configurationsettings for each server in the keeper clusterserver and settings for each
    idnumeric id of the server for keeper services1
    hostnamehostname, IP or FQDN of each server in the keeper clusterchnode1.domain.com
    portport to listen on for interserver keeper connections9444
    note

    View the ClickHouse Keeper docs page for details on all the available parameters.

  1. Enable the Zookeeper component. It will use the ClickHouse Keeper engine:

        <zookeeper>
    <node>
    <host>chnode1.domain.com</host>
    <port>9181</port>
    </node>
    <node>
    <host>chnode2.domain.com</host>
    <port>9181</port>
    </node>
    <node>
    <host>chnode3.domain.com</host>
    <port>9181</port>
    </node>
    </zookeeper>

    These are the basic settings used above:

    ParameterDescriptionExample
    nodelist of nodes for ClickHouse Keeper connectionssettings entry for each server
    hosthostname, IP or FQDN of each ClickHouse keepr nodechnode1.domain.com
    portClickHouse Keeper client port9181
  2. Restart ClickHouse and verify that each Keeper instance is running. Execute the following command on each server. The ruok command returns imok if Keeper is running and healthy:

    # echo ruok | nc localhost 9181; echo
    imok
  3. The system database has a table named zookeeper that contains the details of your ClickHouse Keeper instances. Let's view the table:

    SELECT *
    FROM system.zookeeper
    WHERE path IN ('/', '/clickhouse')

    The table looks like:

    β”Œβ”€name───────┬─value─┬─czxid─┬─mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬─pzxid─┬─path────────┐
    β”‚ clickhouse β”‚ β”‚ 124 β”‚ 124 β”‚ 2022-03-07 00:49:34 β”‚ 2022-03-07 00:49:34 β”‚ 0 β”‚ 2 β”‚ 0 β”‚ 0 β”‚ 0 β”‚ 2 β”‚ 5693 β”‚ / β”‚
    β”‚ task_queue β”‚ β”‚ 125 β”‚ 125 β”‚ 2022-03-07 00:49:34 β”‚ 2022-03-07 00:49:34 β”‚ 0 β”‚ 1 β”‚ 0 β”‚ 0 β”‚ 0 β”‚ 1 β”‚ 126 β”‚ /clickhouse β”‚
    β”‚ tables β”‚ β”‚ 5693 β”‚ 5693 β”‚ 2022-03-07 00:49:34 β”‚ 2022-03-07 00:49:34 β”‚ 0 β”‚ 3 β”‚ 0 β”‚ 0 β”‚ 0 β”‚ 3 β”‚ 6461 β”‚ /clickhouse β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Configure a cluster in ClickHouse​

  1. Let's configure a simple cluster with 2 shards and only one replica on 2 of the nodes. The third node will be used to achieve a quorum for the requirement in ClickHouse Keeper. Update the configuration on chnode1 and chnode2. The following cluster defines 1 shard on each node for a total of 2 shards with no replication. In this example, some of the data will be on node and some will be on the other node:

        <cluster_2S_1R>
    <shard>
    <replica>
    <host>chnode1.domain.com</host>
    <port>9000</port>
    <user>default</user>
    <password>ClickHouse123!</password>
    </replica>
    </shard>
    <shard>
    <replica>
    <host>chnode2.domain.com</host>
    <port>9000</port>
    <user>default</user>
    <password>ClickHouse123!</password>
    </replica>
    </shard>
    </cluster_2S_1R>
    ParameterDescriptionExample
    shardlist of replicas on the cluster definitionlist of replicas for each shard
    replicalist of settings for each replicasettings entries for each replica
    hosthostname, IP or FQDN of server that will host a replica shardchnode1.domain.com
    portport used to communicate using the native tcp protocol9000
    userusername that will be used to authenticate to the cluster instancesdefault
    passwordpassword for the user define to allow connections to cluster instancesClickHouse123!
  1. Restart ClickHouse and verify the cluster was created:

    SHOW clusters;

    You should see your cluster:

    β”Œβ”€cluster───────┐
    β”‚ cluster_2S_1R β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Create and test distributed table​

  1. Create a new database on the new cluster using ClickHouse client on chnode1. The ON CLUSTER clause automatically creates the database on both nodes.

    CREATE DATABASE db1 ON CLUSTER 'cluster_2S_1R';
  2. Create a new table on the db1 database. Once again, ON CLUSTER creates the table on both nodes.

    CREATE TABLE db1.table1 on cluster 'cluster_2S_1R'
    (
    `id` UInt64,
    `column1` String
    )
    ENGINE = MergeTree
    ORDER BY column1
  3. On the chnode1 node, add a couple of rows:

    INSERT INTO db1.table1
    (id, column1)
    VALUES
    (1, 'abc'),
    (2, 'def')
  4. Add a couple of rows on the chnode2 node:

    INSERT INTO db1.table1
    (id, column1)
    VALUES
    (3, 'ghi'),
    (4, 'jkl')
  5. Notice that running a SELECT statement on each node only shows the data on that node. For example, on chnode1:

    SELECT *
    FROM db1.table1
    Query id: 7ef1edbc-df25-462b-a9d4-3fe6f9cb0b6d

    β”Œβ”€id─┬─column1─┐
    β”‚ 1 β”‚ abc β”‚
    β”‚ 2 β”‚ def β”‚
    β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    2 rows in set. Elapsed: 0.006 sec.

    On chnode2:

    SELECT *
    FROM db1.table1
    Query id: c43763cc-c69c-4bcc-afbe-50e764adfcbf

    β”Œβ”€id─┬─column1─┐
    β”‚ 3 β”‚ ghi β”‚
    β”‚ 4 β”‚ jkl β”‚
    β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  6. You can create a Distributed table to represent the data on the two shards. Tables with the Distributed table engine do not store any data of their own, but allow distributed query processing on multiple servers. Reads hit all the shards, and writes can be distributed across the shards. Run the following query on chnode1:

    CREATE TABLE db1.dist_table (
    id UInt64,
    column1 String
    )
    ENGINE = Distributed(cluster_2S_1R,db1,table1)
  7. Notice querying dist_table returns all four rows of data from the two shards:

    SELECT *
    FROM db1.dist_table
    Query id: 495bffa0-f849-4a0c-aeea-d7115a54747a

    β”Œβ”€id─┬─column1─┐
    β”‚ 1 β”‚ abc β”‚
    β”‚ 2 β”‚ def β”‚
    β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”Œβ”€id─┬─column1─┐
    β”‚ 3 β”‚ ghi β”‚
    β”‚ 4 β”‚ jkl β”‚
    β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    4 rows in set. Elapsed: 0.018 sec.

Summary​

This guide demostrated how to setup a cluster using ClickHouse Keeper. With ClickHouse Keeper, you can configure clusters and define distributed tables that can be replicated across shards.