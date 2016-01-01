Introduction to the ClickHouse Operator

This document provides an overview of key concepts and usage patterns for the ClickHouse Operator.

The ClickHouse Operator is a Kubernetes operator that automates the deployment and management of ClickHouse clusters on Kubernetes. Built using the operator pattern, it extends the Kubernetes API with custom resources that represent ClickHouse clusters and their dependencies.

The operator handles:

Cluster lifecycle management (creation, updates, scaling, deletion)

ClickHouse Keeper cluster coordination

Automatic configuration generation

Database schema synchronization

Rolling updates and upgrades

Storage provisioning

The operator provides two main custom resource definitions (CRDs):

Represents a ClickHouse database cluster with configurable replicas and shards.

apiVersion: clickhouse.com/v1alpha1 kind: ClickHouseCluster metadata: name: sample-cluster spec: replicas: 3 shards: 2 keeperClusterRef: name: sample-keeper dataVolumeClaimSpec: resources: requests: storage: 100Gi

Represents a ClickHouse Keeper cluster for distributed coordination (ZooKeeper replacement).

apiVersion: clickhouse.com/v1alpha1 kind: KeeperCluster metadata: name: sample-keeper spec: replicas: 3 dataVolumeClaimSpec: resources: requests: storage: 10Gi

Every ClickHouseCluster requires a ClickHouse Keeper cluster for distributed coordination. The Keeper cluster must be referenced in the ClickHouseCluster spec using keeperClusterRef .

Each ClickHouseCluster must have its own dedicated KeeperCluster. You cannot share a single KeeperCluster between multiple ClickHouseClusters.

Why? The operator automatically generates a unique authentication key for each ClickHouseCluster to access its Keeper. This key is stored in a Secret and cannot be shared.

Consequences:

Multiple ClickHouseClusters cannot reference the same KeeperCluster

Recreating a ClickHouseCluster requires recreating its KeeperCluster

Note Persistent Volumes are not deleted automatically when ClickHouseCluster or KeeperCluster resources are deleted.

When recreating a cluster:

Delete the ClickHouseCluster resource Delete the KeeperCluster resource Wait for all pods to terminate Optionally delete PersistentVolumeClaims if you want to start fresh Recreate both KeeperCluster and ClickHouseCluster together

To avoid authentication errors, either delete the Persistent Volumes manually or recreate both clusters together with fresh storage.

The ClickHouse Operator automatically replicates database definitions across all replicas in a cluster.

The operator synchronizes:

Replicated database definitions

Integration database engines (PostgreSQL, MySQL, etc.)

The operator does not synchronize:

Non-replicated databases (Atomic, Ordinary, etc.)

Local tables in non-replicated databases

Table data (handled by ClickHouse replication)

Best practice Always use the Replicated database engine for production deployments.

Benefits:

Automatic schema replication across all nodes

Simplified table management

Operator can sync to new replicas

Consistent schema across the cluster

Create databases with distributed DDL:

CREATE DATABASE my_database ON CLUSTER 'default' ENGINE = Replicated;

Non-replicated database engines (Atomic, Lazy, SQLite, Ordinary) require manual schema management:

Tables must be created individually on each replica

Schema drift can occur between nodes

Operator cannot automatically sync new replicas

To disable automatic schema replication, set spec.settings.enableDatabaseSync to false in the ClickHouseCluster resource.

The operator manages storage through Kubernetes PersistentVolumeClaims (PVCs).

Specify storage requirements in dataVolumeClaimSpec :

spec: dataVolumeClaimSpec: storageClassName: fast-ssd accessModes: - ReadWriteOnce resources: requests: storage: 500Gi

Creation : PVCs are created automatically with the cluster

: PVCs are created automatically with the cluster Expansion : Supported if StorageClass allows volume expansion

: Supported if StorageClass allows volume expansion Retention : PVCs are not deleted automatically on cluster deletion

: PVCs are deleted automatically on cluster deletion Reuse: Existing PVCs can be reused if cluster is recreated with same name

To completely remove storage:

# Delete cluster kubectl delete clickhousecluster my-cluster # Wait for pods to terminate kubectl wait --for=delete pod -l app.kubernetes.io/instance=my-cluster # Delete PVCs kubectl delete pvc -l app.kubernetes.io/instance=sample-cluster

Pre-configured Cluster: Cluster named 'default' containing all ClickHouse nodes.

Cluster named 'default' containing all ClickHouse nodes. Default macros: Some useful macros are pre-defined: {cluster} : Cluster name ( default ) {shard} : Shard number {replica} : Replica number

Some useful macros are pre-defined: Replicated storage for Role Based Access Control(RBAC) entities

Replicated storage for User Defined Functions(UDF)