Deploying ClickHouse: Examples from Support Services

Architecture.1S_2R_ReplicatedMergeTree_5-nodes.3.CH.Keeper.nodes.2.CH.nodes.png

At ClickHouse Support, we face every day an incredible variety of technical challenges, from query performance reviews to troubleshooting all the possible combinations of configurations that ClickHouse can support.

While ClickHouse Cloud takes the pain out of managing these complex configurations and lets you focus on the fun part of analyzing your datasets, we still understand that running ClickHouse on-premise is necessary for several reasons, which often depend on which stage of the ClickHouse journey you are at.

As support engineers, we must keep up with all of the features to provide world-class support to our users. To assist with this, we use docker compose files to quickly replicate specific architectures and configurations. Today, in the spirit of open-source, we are providing these docker-compose files to give our users a head start and give you confidence that you are using a tested architecture.

Note: If you are getting started, you can still run a single command to download the ClickHouse binary. For more details, see the Quick Start.

The examples described here are for production deployments of ClickHouse beyond a simple single server installation. If you are looking for Docker compose files for multi-server deployments or guides to go beyond the quick start and, for example, to separate compute and storage or replicate data across two data centers for fault tolerance, continue reading.

The terminology details are in the documentation, but here is the short form:

Replica: A replica is a copy of your data. You always have at least one. When you create a table, the initial instance of that table data is the first replica. To survive the loss of a server, a network outage, or other problems, you might want a second replica on a second server.

Shard: A shard is a portion of your table data. You also always have at least one shard. We recommend having only one shard until your server is not keeping up with the query load and you cannot scale vertically.

1S_2R: If you look at the diagram at the top of this post, you will see that the cluster is named cluster_1S_2R. This means that the cluster has one shard and two replicas. 1S_2R is a common shorthand used within the ClickHouse community.

If you are setting up a ClickHouse server to import data for analysis, but the authoritative record is an OLTP database or a set of data files, then you probably only need one shard and one replica (so a single ClickHouse server).

If your data only lives in ClickHouse, and the data can not be recreated or regenerated, then you probably need one shard and two replicas (so two ClickHouse Servers).

The common question is, “How many shards do I need?”. Our answer is, “One until you exceed the capacity of the current hardware and cannot scale your hardware vertically.”

The documentation for a single shard with two replicas and for two shards with one replica is at https://clickhouse.com/docs/en/architecture/introduction

To separate storage and compute, see using S3 object storage and the same using GCS.

More examples are in the ClickHouse/examples GitHub repo with all of the configuration and Docker compose files. For example:

Using S3/MinIO object storage with ClickHouse
ClickHouse and Grafana
ClickHouse and Vector
ClickHouse Proxy
Replication
Sharding

We always want to hear what the community is experiencing; if you have created an exciting deployment design, please consider opening a pull request in the ClickHouse documentation repo. If pull requests are not your thing or you have a question, open an issue or talk to the community on Slack. We look forward to hearing from you!