Skip to main content
Skip to main content
Edit this page

Server overload

Overview

Sometimes server can become overloaded due to different reasons. In order to determine the current CPU overload, ClickHouse server calculates the ratio of CPU wait time (OSCPUWaitMicroseconds metric) to busy time (OSCPUVirtualTimeMicroseconds metric). When the server is overloaded above certain ratio, it makes sense to discard some queries or even drop connection requests to not increase the load even more.

There's a server setting os_cpu_busy_time_threshold which controls the minimum busy time to consider CPU doing some useful work. If the current value of OSCPUVirtualTimeMicroseconds metric is below this value, CPU overload is assumed to be 0.

Rejecting queries

The behavior of rejecting queries is controlled by query-level settings min_os_cpu_wait_time_ratio_to_throw and max_os_cpu_wait_time_ratio_to_throw. If those settings are set and min_os_cpu_wait_time_ratio_to_throw is less than max_os_cpu_wait_time_ratio_to_throw, then the query is rejected and SERVER_OVERLOADED error is thrown with some probability is the overload ratio is at least min_os_cpu_wait_time_ratio_to_throw. The probability is determined as a linear interpolation between min and max ratios. For example, if min_os_cpu_wait_time_ratio_to_throw = 2, max_os_cpu_wait_time_ratio_to_throw = 6, and cpu_overload = 4, then the query will be rejected with a probability of 0.5.

Dropping connections

Dropping connections is controlled by server-level settings min_os_cpu_wait_time_ratio_to_drop_connection and max_os_cpu_wait_time_ratio_to_drop_connection. Those settings can be changed without server restart. The idea behind those settings is similar to the one with rejecting queries. The only difference in this case is if the server is overloaded, the connection attempt will be rejected from the server side.