Introduction

These deployment examples are based on the advice provided to ClickHouse users by the ClickHouse Support and Services organization. These are working examples, and we recommend that you try them and then adjust them to suit your needs. You may find an example here that fits your requirements exactly. Alternatively, should you have a requirement where data is replicated three times instead of two, you should be able to add another replica by following the patterns presented here.

Terminology

Replica

A copy of data. ClickHouse always has at least one copy of your data, and so the minimum number of replicas is one. This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation. Adding a second replica of your data provides fault tolerance.

Shard

A subset of data. ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the sharding key, and is defined when you create the distributed table. The sharding key can be random or as an output of a hash function. The deployment examples involving sharding will use rand() as the sharding key, and will provide further information on when and how to choose a different sharding key.

Distributed coordination

ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.

Examples

Basic

The Scaling out example shows how to shard your data across two nodes, and use a distributed table. This results in having data on two ClickHouse nodes. The two ClickHouse nodes also run ClickHouse Keeper providing distributed synchronization. A third node runs ClickHouse Keeper standalone to complete the ClickHouse Keeper quorum.
The Replication for fault tolerance example shows how to replicate your data across two nodes, and use a ReplicatedMergeTree table. This results in having data on two ClickHouse nodes. In addition to the two ClickHouse server nodes there are three ClickHouse Keeper standalone nodes to manage replication.

Intermediate

Coming soon

Advanced

Coming soon

Terminology​

Replica​

Shard​

Distributed coordination​

Examples​

Basic​

Intermediate​

Advanced​