Cloud compatibility - ClickHouse Documentation

This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud. While ClickHouse Cloud is based on the open-source ClickHouse distribution, there may be some differences in architecture and implementation. You may find this blog on how we built ClickHouse Cloud interesting and relevant to read as background.

ClickHouse Cloud architecture

ClickHouse Cloud significantly simplifies operational overhead and reduces the costs of running ClickHouse at scale. There is no need to size your deployment upfront, set up replication for high availability, manually shard your data, scale up your servers when your workload increases, or scale them down when you’re not using them — we handle this for you. These benefits come as a result of architectural choices underlying ClickHouse Cloud:

Compute and storage are separated and thus can be automatically scaled along separate dimensions, so you don’t have to over-provision either storage or compute in static instance configurations.
Tiered storage on top of object store and multi-level caching provides virtually limitless scaling and good price/performance ratio, so you don’t have to size your storage partition upfront and worry about high storage costs.
High availability is on by default and replication is transparently managed, so you can focus on building your applications or analyzing your data.
Automatic scaling for variable continuous workloads is on by default, so you don’t have to size your service upfront, scale up your servers when your workload increases, or manually scale down your servers when you have less activity
Seamless hibernation for intermittent workloads is on by default. We automatically pause your compute resources after a period of inactivity and transparently start it again when a new query arrives, so you don’t have to pay for idle resources.
Advanced scaling controls provide the ability to set an auto-scaling maximum for additional cost control or an auto-scaling minimum to reserve compute resources for applications with specialized performance requirements.

Capabilities

ClickHouse Cloud provides access to a curated set of capabilities in the open source distribution of ClickHouse. Tables below describe some features that are disabled in ClickHouse Cloud at this time.

Database and table engines

ClickHouse Cloud is a highly-available, replicated service by default, built on the SharedMergeTree table engine family. When you create a table with a standard MergeTree-family engine, Cloud automatically substitutes the corresponding Shared* engine. You don’t add a Shared or Replicated prefix yourself.

You specify	Cloud uses
`MergeTree` (or no engine)	`SharedMergeTree`
`ReplacingMergeTree`	`SharedReplacingMergeTree`
`SummingMergeTree`	`SharedSummingMergeTree`
`AggregatingMergeTree`	`SharedAggregatingMergeTree`
`CollapsingMergeTree`	`SharedCollapsingMergeTree`
`VersionedCollapsingMergeTree`	`SharedVersionedCollapsingMergeTree`
`GraphiteMergeTree`	`SharedGraphiteMergeTree`

Replicated* engines are converted to the same Shared* equivalents. The substitution is visible in SHOW CREATE TABLE, which reports the Shared* engine even though your statement specified the plain variant. The following engines are also supported and used as written:

URL
View
MaterializedView
GenerateRandom
Null
Buffer
Memory
Deltalake
Hudi
MySQL
MongoDB
NATS
RabbitMQ
PostgreSQL
S3
Kafka

Interfaces

ClickHouse Cloud supports HTTPS, native interfaces, and the MySQL wire protocol. Support for more interfaces such as Postgres is coming soon.

Dictionaries

Dictionaries are a popular way to speed up lookups in ClickHouse. ClickHouse Cloud currently supports dictionaries from PostgreSQL, MySQL, remote and local ClickHouse servers, Redis, MongoDB and HTTP sources.

Federated queries

We support federated ClickHouse queries for cross-cluster communication in the cloud, and for communication with external self-managed ClickHouse clusters. ClickHouse Cloud currently supports federated queries using the following integration engines:

Deltalake
Hudi
MySQL
MongoDB
NATS
RabbitMQ
PostgreSQL
S3

Federated queries with some external database and table engines, such as SQLite, ODBC, JDBC, Redis, HDFS and Hive aren’t yet supported.

User defined functions

User-defined functions in ClickHouse Cloud are in public beta.

Settings behavior

ImportantUDFs in ClickHouse Cloud don’t inherit user-level settings. They execute with default system settings.

This means:

Session-level settings (set via SET statement) aren’t propagated to UDF execution context
User profile settings aren’t inherited by UDFs
Query-level settings don’t apply within UDF execution

Experimental features

Experimental features are disabled in ClickHouse Cloud services to ensure the stability of service deployments.

Named collections

Named collections aren’t currently supported in ClickHouse Cloud.

Operational defaults and considerations

The following are default settings for ClickHouse Cloud services. In some cases, these settings are fixed to ensure the correct operation of the service, and in others, they can be adjusted.

Operational limits

`max_parts_in_total: 10,000`

The default value of the max_parts_in_total setting for MergeTree tables has been lowered from 100,000 to 10,000. The reason for this change is that we observed that a large number of data parts is likely to cause a slow startup time of services in the cloud. A large number of parts usually indicate a choice of too granular partition key, which is typically done accidentally and should be avoided. The change of default will allow the detection of these cases earlier.

`max_concurrent_queries: 1,000`

Increased this per-server setting from the default of 100 to 1000 to allow for more concurrency. This will result in number of replicas * 1,000 concurrent queries for the offered tier services. 1000 concurrent queries for Basic tier service limited to a single replica and 1000+ for Scale and Enterprise, depending on the number of replicas configured.

`max_table_size_to_drop: 1,000,000,000,000`

Increased this setting from 50GB to allow for dropping of tables/partitions up to 1TB.

System settings

ClickHouse Cloud is tuned for variable workloads, and for that reason most system settings aren’t configurable at this time. We don’t anticipate the need to tune system settings for most users, but if you have a question about advanced system tuning, please contact ClickHouse Cloud Support.

Advanced security administration

As part of creating the ClickHouse service, we create a default database, and the default user that has broad permissions to this database. This initial user can create additional users and assign their permissions to this database. Beyond this, the ability to enable the following security features within the database using Kerberos, LDAP, or SSL X.509 certificate authentication aren’t supported at this time.

Roadmap

We’re evaluating demand for many other features in ClickHouse Cloud. If you have feedback and would like to ask for a specific feature, please submit it here.

​ClickHouse Cloud architecture

​Capabilities

​Database and table engines

​Interfaces

​Dictionaries

​Federated queries

​User defined functions

​Settings behavior

​Experimental features

​Named collections

​Operational defaults and considerations

​Operational limits

​max_parts_in_total: 10,000

​max_concurrent_queries: 1,000

​max_table_size_to_drop: 1,000,000,000,000

​System settings

​Advanced security administration

​Roadmap

ClickHouse Cloud architecture

Capabilities

Database and table engines

Interfaces

Dictionaries

Federated queries

User defined functions

Settings behavior

Experimental features

Named collections

Operational defaults and considerations

Operational limits

`max_parts_in_total: 10,000`

`max_concurrent_queries: 1,000`

`max_table_size_to_drop: 1,000,000,000,000`

System settings

Advanced security administration

Roadmap