Sizing ClickHouse for a customer-facing analytics product starts with the workload. An application can have thousands of active users, generate bursts of dashboard requests, ingest data continuously, and still require sub-second query latency. The number of queries it can serve depends on the cost of those requests, the available resources, and the required latency.
ClickHouse has no fixed 100-query concurrency ceiling. The server-level max_concurrent_queries setting defaults to 0, which means unlimited. In ClickHouse Cloud, max_concurrent_queries_for_all_users defaults to 1,000 per replica. Both are configurable admission controls that protect a deployment during overload, and each server or Cloud replica evaluates them independently.
Our concurrency documentation covers analytical workloads serving more than 10,000 queries per second with latency below 10 milliseconds on petabyte-scale databases. Capacity varies with the query mix, data layout, cache state, ingestion load, latency targets, hardware, and cluster topology. A selective query that reads a small number of granules has a very different cost from a large aggregation, sort, or join.
Measure representative traffic at increasing concurrency. Use the results to configure resource limits, admission controls, and scaling.
TL;DR: How many concurrent queries can ClickHouse handle?
ClickHouse has no fixed architectural ceiling for concurrent queries. Sustainable concurrency is the number of simultaneous queries that meet the latency target for a specific query mix and deployment. Configured query limits protect the service during overload; they do not define the engine's capacity.
To size a deployment:
- Translate active users and dashboard fan-out into peak queries per second and simultaneous queries.
- Benchmark with production-like conditions using the representative query mix, real ingestion load, and cache state.
- Configure limits based on benchmark results covering per-query parallelism, memory, admission controls, and workload scheduling.
- Add replicas to meet throughput and availability targets when measured per-replica capacity falls short.
Measure the workload behind concurrent users
The number of active users becomes useful for sizing only after it is translated into database traffic.
A product can have 10,000 active users while only 50 queries execute at once. A single dashboard load can also fan out into 20 queries, causing 500 users to produce a burst of 10,000 requests.
Define these measurements before sizing the deployment:
- Peak requests per second sent to ClickHouse
- Number of queries generated by each page or dashboard load
- Number of simultaneously executing queries
- Query mix, including filters, aggregations, joins, sorts, and exports
- p50, p95, and p99 latency targets for each query class
- Peak result size
- Ingestion rate and insert batch size
- Required capacity during a replica failure
Under steady-state conditions, Little's Law relates throughput, average time in the measured system, and average concurrency. At 1,000 queries per second and an average ClickHouse execution time of 50 milliseconds, not end-to-end request latency, the average number of queries executing is approximately 50.
At the same request rate with 500 milliseconds of ClickHouse execution time, it is approximately 500. Using end-to-end request latency instead would include time spent in application queues, network transit, and other components outside query execution.
How ClickHouse executes concurrent queries
ClickHouse executes queries through parallel processing pipelines. The selected data is divided across processing lanes, and each lane processes blocks through operations such as filtering and aggregation.
max_threads controls the maximum number of query-processing threads available to a query and defaults to the number of hardware threads available to ClickHouse.
This setting defines an upper bound. The actual number of processing lanes depends on the amount of data selected and the work available in each pipeline stage. A selective query can use fewer lanes than its max_threads value. Under concurrency control, a query may start with limited parallelism and scale up when more CPU slots become available.
The same workload dependency that shapes single-query parallelism applies across concurrent queries.
A server with 64 available hardware threads and max_threads=8 can execute more than eight queries simultaneously. Individual queries may use fewer than eight threads, and CPU scheduling allows additional queries to make progress. The sustainable query count depends on how much CPU time, memory, and I/O the workload consumes while meeting its latency target.
Memory consumption follows a similar pattern, but it is often nonlinear, as more processing lanes activate additional buffers and intermediate states. Aggregation cardinality, join build sides, sorting, decompression, result size, and query shape can drive most of the memory consumption.
Use EXPLAIN PIPELINE to inspect processing lanes for a query. Use system.processes to inspect active queries, current and peak memory, and peak thread usage.
Configure ClickHouse concurrency and resource limits
High-concurrency deployments need separate controls for per-query cost, total admitted work, and resource allocation between workloads.
max_threads: Control per-query parallelism
max_threads limits how many query-processing threads a query can use. Reducing it can improve throughput under load by preventing one query from consuming most available CPU parallelism, but it can also increase latency for scans that benefit from parallel execution.
Do not set max_threads=1 or 2 as a universal dashboard rule. Test values such as 1, 2, 4, and the deployment default against the real query mix, and select the value that produces the required tail latency and aggregate throughput.
In practice, a selective lookup may not reach its configured maximum, while a large scan can benefit materially from a higher value. Separate profiles let the application use a lower tested limit, while batch analytics use a different limit.
max_memory_usage: Limit memory per query
max_memory_usage limits the RAM used by one query on one server. A value of 0 means unlimited in both self-managed and Cloud deployments. In ClickHouse Cloud, the default is set based on replica memory.
Set this limit from measured peak memory for each query class, with headroom for data growth and parameter variation. If the limit is too low, it converts valid traffic into query failures, but if it is too high, it allows several expensive queries to exhaust the server together.
Per-query memory limits alone do not control the aggregate memory across concurrent queries. Configure max_memory_usage_for_user and server memory limits accordingly.
ClickHouse concurrency limits and query overflow behavior
ClickHouse provides multiple admission controls:
- Server-wide
max_concurrent_queries - Server-wide
max_concurrent_select_queries - Server-wide
max_concurrent_insert_queries - Per-user
max_concurrent_queries_for_user - Cross-user
max_concurrent_queries_for_all_users
These limits are enforced by the local server process, and in a replicated deployment, each replica applies its configured limits to the queries it receives.
When the server-wide max_concurrent_queries limit is reached, a new query can wait for a slot for up to queue_max_wait_ms. Its default value is 0, which means no wait. If no slot becomes available before the configured timeout, ClickHouse rejects the query with TOO_MANY_SIMULTANEOUS_QUERIES.
The select, insert, per-user, and cross-user concurrency limits reject a new query when their threshold is reached. The bounded wait associated with max_concurrent_queries provides simple overflow handling. Use a QUERY resource for workload-aware query-slot scheduling.
Set admission limits from per-replica load-test results and operational headroom. The limit should keep each server below the point where p99 latency rises sharply, memory pressure causes failures, or ingestion and background work fall behind.
Use measured values for each deployment. A limit of 100 can be too high for memory-heavy joins and too low for selective dashboard queries.
Concurrent thread scheduling in ClickHouse
For self-managed deployments, concurrent_threads_soft_limit_num and concurrent_threads_soft_limit_ratio_to_cores define a soft limit for query-processing threads across concurrent queries. The ratio defaults to 2, so when the absolute limit remains 0, the effective soft limit is twice the number of CPU cores available to ClickHouse. If both settings are nonzero, ClickHouse uses the lower limit.
The concurrent thread scheduler distributes CPU slots among queries that use concurrency control. A query still receives a thread and may scale up as slots become available, while query admission remains governed separately.
These controls operate at different scopes:
max_threadslimits one query.- Concurrent-thread scheduling allocates processing threads across queries.
- Admission limits cap the number of accepted queries.
Keeping these scopes separate makes the resulting capacity model easier to test and operate.
ClickHouse workload scheduling for CPU, I/O, and query slots
Workload scheduling uses RESOURCE and WORKLOAD objects. Queries are assigned to a workload through the workload setting.
Declaring a CPU resource disables the effect of concurrent_threads_soft_limit_num and concurrent_threads_soft_limit_ratio_to_cores. Participating queries then use workload settings such as max_concurrent_threads or max_concurrent_threads_ratio_to_cores, with the workload scheduler distributing their CPU slots instead of concurrent_threads_scheduler.
The framework can schedule:
- CPU resources
- Remote disk I/O
- Query slots
CPU workloads can define thread limits, CPU shares, weights, and priorities, but CPU throttling through max_cpus and max_cpu_share is active only when cpu_slot_preemption is enabled.
Query-slot scheduling can define limits on concurrent queries, query start rate, bursts, and waiting queries. When a query-slot constraint is full, the query waits until capacity becomes available. Waiting queries remain outside SHOW PROCESSLIST until they start. If max_waiting_queries is reached, ClickHouse returns SERVER_OVERLOADED.
Query-slot waits have no server-side timeout, so applications should enforce request deadlines and cancellation. Asynchronous inserts and some administrative queries, including KILL, are excluded from workload query-slot accounting.
Use query, user, and server settings for memory protection. CPU scheduling currently covers query workloads and excludes merges and mutations, so separate compute remains the strongest form of resource isolation.
When interactive and batch traffic share compute, configure separate workloads with weights, priorities, and query-slot limits derived from mixed-workload tests. Large scans and exports can then operate under limits appropriate to their latency and resource profile.
Configure ClickHouse settings from benchmark results
Assign measured limits to a dedicated application user with ALTER USER or through an XML profile in self-managed deployments. Set max_threads from measured throughput and tail latency, max_memory_usage above the observed peak for valid dashboard queries, and max_concurrent_queries_for_user below the measured per-replica overload point with operational headroom.
max_execution_time can provide an additional server-side guard, but it is not a substitute for the application's request deadline. By default, ClickHouse begins estimating total execution time after timeout_before_checking_execution_speed, which defaults to 10 seconds. Setting it to 0 makes max_execution_time use elapsed clock time. Enforcement occurs only at designated processing points, so actual runtime can exceed the configured limit. Enforce the user-facing deadline in the client and propagate cancellation to ClickHouse.
Use dedicated profiles for workloads with different resource and latency requirements.
Prevent inserts from degrading query performance
High-frequency small synchronous inserts can create data parts faster than background merges can consolidate them. The resulting merge pressure can degrade read performance even when SELECT and INSERT queries use different settings.
The simplest fix is client-side batching. We recommend batches of at least 1,000 rows and ideally 10,000 to 100,000 rows for synchronous inserts.
When client-side batching is not practical, enable asynchronous inserts:
ALTER USER ingest_user SETTINGS
async_insert = 1,
wait_for_async_insert = 1;With async_insert=1, ClickHouse buffers compatible inserts and flushes them when a size, time, or query-count threshold is reached. This reduces part creation and ingestion overhead.
With wait_for_async_insert=1, ClickHouse acknowledges the insert after the buffer is flushed successfully, making it the documented default and recommended production mode. The client waits, receives flush errors, and retains reliable backpressure.
Setting wait_for_async_insert=0 acknowledges data when it enters memory. That mode can lose buffered data and hide flush errors from the client.
Asynchronous inserts improve batching, while flushes, part creation, and merges continue to use server resources.
When workload scheduling cannot meet the required service-level objective, self-managed deployments can use separate nodes or clusters. In ClickHouse Cloud, separate services in a warehouse let read and write workloads use separate compute.
Use the ClickHouse query cache for repeated dashboard queries
The query cache can reduce work when the same deterministic SELECT query runs repeatedly, and slightly stale results are acceptable.
The cache is opt-in through use_query_cache=true, exists once per ClickHouse server process, and is not shared between users by default. Entries become stale after 60 seconds by default.
By default, cache eligibility excludes queries that use nondeterministic functions such as now() and today(). A dashboard query written as event_time >= now() - INTERVAL 24 HOUR therefore behaves differently from a query with fixed time boundaries.
Use explicit time buckets or application parameters when cached results can follow a defined refresh interval:
SELECT
customer_id,
sum(revenue)
FROM hourly_revenue
WHERE hour >= {window_start:DateTime}
AND hour < {window_end:DateTime}
GROUP BY customer_id
SETTINGS
use_query_cache = true,
query_cache_ttl = 30;Measure the hit rate through system.query_log, system.events, and system.metrics. Calculate uncached capacity independently because a cold cache, a fresh deployment, or a changed query shape can remove expected hits immediately.
Scale ClickHouse read concurrency with replicas
Replicas add read capacity only when traffic is distributed across them.
In a self-managed replicated cluster, configure the client, proxy, or Distributed table routing so that read queries reach different replicas. Replicas also perform replication and background merges, so reserve capacity for that work.
Shards and replicas solve different problems:
- Shards split data and let a distributed query process different data subsets across servers.
- Replicas store copies of the same data and can serve independent read queries.
- Parallel replicas let one query use multiple replicas. They can reduce latency for suitable queries while consuming capacity that could otherwise serve independent requests.
For high-concurrency dashboards, load balancing independent requests across replicas is the primary throughput mechanism. Benchmark parallel replicas by query class because coordination overhead can slow down small queries, complex queries, and high-cardinality aggregations.
Scale concurrent queries in ClickHouse Cloud
ClickHouse Cloud uses SharedMergeTree over shared object storage. Compute replicas do not need to maintain independent full copies of table data and metadata, which supports faster scale operations than a deployment that must copy local data to each new replica.
New SharedMergeTree replicas still require CPU, memory, and local cache population, which should be included in scale-up and failover tests.
Vertical autoscaling in ClickHouse Cloud
ClickHouse Cloud Scale and Enterprise services support vertical autoscaling based on CPU and memory usage. Administrators configure minimum and maximum sizes, and the service scales within those bounds.
Size the maximum high enough for the tested peak workload. Autoscaling reacts to measured load and takes time, so capacity testing and admission controls remain part of the design.
Horizontal replica scaling in ClickHouse Cloud
For regular horizontal scaling, administrators change the replica count through the Cloud console or API. Scale and Enterprise services can use additional replicas up to the documented service limits.
Scheduled scaling, currently in Private Preview, can change replica count or memory tier at configured times. Metric-based autoscaling applies to vertical service size, while scheduled scaling handles predictable capacity changes.
Isolate workloads with ClickHouse Cloud warehouses
Warehouses provide compute-compute separation, allowing multiple services to use separate compute and endpoints while sharing the same data.
A user-facing deployment can use:
- A read-write service for ingestion and write operations
- A read-only service for dashboard queries
- A separate service for batch analytics or exports
Read-only services do not perform background merges outside system tables, so their CPU and memory stay focused on read queries.
Warehouses provide strong compute isolation. Services still share storage and ClickHouse Keeper, and our warehouse documentation lists edge cases for shared operations. Measure each service independently and set its scaling bounds based on its workload.
Warehouses are available on Scale and Enterprise plans.
High-concurrency ClickHouse deployments
These deployments show how different organizations combine ingestion and analytics with workload-specific data models, infrastructure, and resource controls.
Cloudflare: Analytics at scale with ClickHouse
Cloudflare uses ClickHouse for HTTP and DNS analytics, customer dashboards, Firewall Analytics, and Cloudflare Radar. At our 2023 community event, Cloudflare shared that its deployment had grown to more than 1,000 active replicas processing hundreds of millions of inserted rows per second.
An earlier account of its HTTP analytics pipeline described a 36-node ClickHouse cluster processing an average of 6 million HTTP requests per second, with peaks up to 8 million. The associated Zone Analytics API served about 40 queries per second and reached about 150 queries per second in load testing on that deployment.
GitLab: Sub-second analytics for 50 million users
GitLab uses ClickHouse for product-facing analytics across GitLab.com, GitLab Dedicated, and self-managed deployments. ClickHouse powers workloads such as Contribution Analytics, GitLab Duo analytics, and SDLC trends for a platform serving 50 million registered users.
Queries over 100 million rows that previously took 30 to 40 seconds now return in under a second. GitLab standardized on ClickHouse as its OLAP engine while continuing to use Postgres for transactional workloads.
Laravel Nightwatch: Real-time observability on ClickHouse Cloud
Laravel Nightwatch uses ClickHouse Cloud for the analytical layer of its first-party observability platform. The service processes more than 1 billion events per day while maintaining sub-second query latency for real-time dashboards.
At launch, Nightwatch processed 500 million events on day one and reported 97 ms average dashboard request latency. Its architecture uses Amazon MSK, ClickPipes, materialized views, and ClickHouse Cloud to separate streaming ingestion from analytical queries.
Mintlify: Customer-facing analytics with sub-one-second dashboards
Mintlify uses ClickHouse Cloud to provide real-time analytics for documentation and knowledge sites used by tens of thousands of companies and tens of millions of developers each month.
After moving from PostHog to ClickHouse, Mintlify reduced analytics dashboard load times from tens of seconds to sub-one-second. The architecture uses ClickHouse materialized views to keep multi-tenant customer dashboards responsive as traffic grows.
Optimize ClickHouse schemas and queries for concurrency
Reducing the amount of data and intermediate state processed by each request is the most effective way to improve concurrency.
- Design the
ORDER BYkey around common filters and the expected data-pruning behavior. - Read only required columns.
- Use projections when a second access pattern justifies additional storage and write work.
- Use dictionaries or direct joins for suitable lookup workloads.
- Use incremental materialized views and
AggregatingMergeTreefor repeated aggregations. - Bound export queries separately from interactive requests.
- Return only the rows and bytes the application needs.
Pre-aggregation can transform a repeated large scan into a much smaller aggregation over stored states. Measure the resulting access pattern, processing lanes, and resource use because the query may still scan multiple rows and execute in parallel.
Query complexity changes the concurrency curve. Large joins, high-cardinality aggregations, full scans, and large sorts consume more CPU, memory, and I/O per request. ClickHouse can execute them concurrently, but the sustainable concurrent query count at a fixed latency target will be lower than for selective queries.
How to benchmark and size ClickHouse concurrency
1. Define query classes
Group traffic into classes such as:
- Interactive dashboard queries
- Drill-down queries
- API lookups
- Large exports
- Batch analytics
- Inserts
Record expected peak QPS, result size, and latency target for each class.
2. Build representative data and traffic
Use production-scale cardinalities, partition counts, part counts, and skew. Run ingestion during the test, and include the parameter values that produce the largest valid scans and aggregation states.
Test both warm and cold cache conditions.
3. Measure query latency and resource usage
For each query class, record:
query_duration_msread_rowsandread_bytesmemory_usagepeak_threads_usageProfileEvents- Result rows and bytes
Use EXPLAIN indexes = 1 to confirm data pruning and EXPLAIN PIPELINE to inspect parallelism.
4. Increase load with clickhouse-benchmark
Use clickhouse-benchmark with --concurrency or --max_concurrency to increase parallel load.
Find the point where one of these conditions occurs:
- p95 or p99 latency exceeds the target
- Throughput stops increasing
- Memory pressure or query failures appear
- CPU wait grows sharply
- Insert latency rises
- Background merges fall behind
If the deployment uses an admission limit, set it below the measured overload point with enough headroom for traffic variation and background work.
5. Test the production workload mix
Do not size each query class in isolation and add the results. Run the production traffic mix, including ingestion, scheduled jobs, and exports.
Repeat the test with one replica unavailable. A high-availability deployment must meet its minimum service objective during a failure, not only when every replica is healthy.
6. Configure limits, queues, and workloads
Apply these settings based on the test results:
- Set per-class
max_threads, memory, result, and server-side execution guards. - Set user and server admission limits.
- Configure workloads for interactive and batch traffic.
- Set queue length and overload behavior for query-slot scheduling.
- Set client timeouts, retry behavior, and backpressure.
7. Add replicas and retest capacity
If one replica cannot meet the target after query and schema optimization, add replicas and distribute traffic across them. In ClickHouse Cloud, set vertical autoscaling bounds and configure the replica count required for peak throughput through the console, API, or a scheduled scaling policy.
Retest after each topology change because additional replicas change routing, cache locality, coordination, and failure behavior.
Monitor ClickHouse concurrency in production
Use system.processes for active-query state and system.query_log for historical analysis. Both tables are local to the server where they are queried.
A useful node-level report groups client-initiated queries by normalized query hash and compares latency, memory, and thread usage:
SELECT
normalized_query_hash,
count() AS executions,
quantile(0.50)(query_duration_ms) AS p50_ms,
quantile(0.95)(query_duration_ms) AS p95_ms,
quantile(0.99)(query_duration_ms) AS p99_ms,
max(memory_usage) AS max_memory,
max(peak_threads_usage) AS max_peak_threads,
sum(read_rows) AS total_read_rows,
sum(read_bytes) AS total_read_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
AND is_initial_query = 1
AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY normalized_query_hash
ORDER BY p99_ms DESC;The is_initial_query = 1 filter excludes child queries created by distributed execution, giving one record per client-initiated query for request counts and latency. The memory_usage and peak_threads_usage values in those records describe the initiating query process and do not include the separate child-query processes running on remote replicas. For distributed resource analysis, query every replica and correlate initial and child records through initial_query_id. Retain the initial record for end-to-end query latency.
In ClickHouse Cloud, query every replica in the current service with clusterAllReplicas('default', merge('system', '^query_log')) in place of system.query_log, and set skip_unavailable_shards=1 so an unavailable replica does not block the report. Within a warehouse, the default cluster covers only the current service. Use clusterAllReplicas('all_groups.default', merge('system', '^query_log')) to query all services in the warehouse.
Workload query-slot waits remain outside system.processes until execution starts. Monitor system.scheduler and its queue_length column for workload queues. This table is also local. Use clusterAllReplicas('default', system.scheduler) for a service-wide view in Cloud, or the all_groups.default cluster for all services in a warehouse.
Monitor capacity signals at the workload level:
- Active queries in
system.processes - Workload queue depth in
system.scheduler - Rejected queries, including
TOO_MANY_SIMULTANEOUS_QUERIESandSERVER_OVERLOADEDerrors - CPU utilization and OS CPU wait
- Query memory and total server memory
- Merge backlog and active part counts
- Insert latency and asynchronous insert failures
- Query cache hit rate
- Per-replica QPS and latency
Capacity planning is continuous. Data distribution, query parameters, and product usage change after launch. Re-run the benchmark when the schema, query mix, replica size, or service-level objective changes.
Build a workload-based ClickHouse capacity model
ClickHouse concurrency follows the cost of the workload and the resources available to execute it. The deployments running at this scale, with thousands of replicas, millions of inserted rows per second, sub-second dashboard latency for millions of users, are not special cases. They are the result of measuring workloads, controlling per-query cost, isolating competing workloads, and scaling compute when throughput requires it.
Configured limits can protect each server or replica at its measured operating boundary. They do not define the capacity of ClickHouse’s execution engine.
Start building on ClickHouse today and see how far your workload can scale.
Frequently asked questions
Is ClickHouse limited to 100 concurrent queries?
No. The server-level max_concurrent_queries default is 0, which means unlimited. A configured value controls admission independently on each server or Cloud replica.
What should `max_threads` be for dashboard queries?
Set it from benchmark results. Test multiple values against the production query mix. Lower values can improve aggregate throughput, while higher values can reduce latency for scans. There is no universal dashboard value.
Does `max_threads=2` mean each query reserves two cores?
No. max_threads sets an upper bound on query-processing parallelism. Actual processing lanes depend on available work and server scheduling.
Does ClickHouse queue queries after reaching `max_concurrent_queries`?
The server-wide max_concurrent_queries limit supports a bounded wait through queue_max_wait_ms. Its default is 0, which means immediate rejection. If the configured wait expires, ClickHouse rejects the query. A QUERY resource provides workload-aware query-slot scheduling. Those queued queries wait until capacity becomes available or max_waiting_queries is reached.
How do I prevent ingestion from degrading dashboard latency?
Batch inserts and monitor merge pressure. Use asynchronous inserts when client-side batching is not practical. Use workload scheduling for resource allocation on shared compute, and separate nodes, clusters, or ClickHouse Cloud services when stronger isolation is required.
Do ClickHouse read replicas increase query concurrency?
Yes, when requests are distributed across them. Adding replicas without changing routing does not increase application throughput.
Can the ClickHouse query cache improve concurrency?
It reduces repeated work for identical deterministic queries when cached results are acceptable. Size the deployment to meet its service objective under the expected uncached workload.
Does ClickHouse Cloud automatically add replicas during traffic spikes?
Metric-based autoscaling changes service size vertically within configured bounds. Administrators manage replica count through the console or API. Scheduled scaling, currently in Private Preview, can adjust replica count for predictable time periods.
How should I size ClickHouse for concurrent queries?
Benchmark the representative mixed workload at increasing concurrency. Select the largest load that meets the required tail latency with headroom for ingestion, background work, traffic bursts, and replica failure.