Skip to main content
Skip to main content
Edit this page

Native Protocol

The native protocol is the binary, connection-oriented protocol that ClickHouse clients and servers speak over TCP. It carries SQL queries, result data, INSERT payloads, execution telemetry, and error signals. It is the protocol behind the command-line client and the C++ and most third-party native drivers.

This page covers the protocol itself: packet framing, the connection state machine, version negotiation, and the body of every non-Block message. The bytes inside Data-family packets (the Block, its columns, and the per-type encodings) are a separate concern, documented in the Native Format specification.

Companion specification

This page is one half of a pair and is published together with the companion Native Format specification. The two specs split the work cleanly: this page owns the packet and transport layer; the Native Format spec owns the bytes inside Data-family packets.

A few properties hold throughout. The protocol is binary and positional: there are no field tags except inside BlockInfo, so a single misplaced byte desynchronizes everything that follows. It is stateful, and each TCP connection processes one query at a time — there is no multiplexing. Fixed-width integers are little-endian.

Overview

PropertyValue
TransportTCP, optionally wrapped in TLS
Byte orderLittle-endian for fixed-width integers
EncodingBinary and positional (no field tags except in BlockInfo)
Connection modelStateful, one query at a time, no multiplexing
VersioningNegotiated at handshake; individual features gated by version
Data formatThe Native Format for all tabular data

Every message on the wire starts with a VarUInt packet type code, followed by a body whose shape depends on that code and on the negotiated protocol version.

A connection runs through three phases — a one-time handshake, then any number of Ping or Query exchanges, then close:

The native TCP protocol always carries tabular data in the Native format, regardless of any FORMAT clause in the SQL. Re-formatting into RowBinary, CSV, JSON, and so on is the client's job, done after it decodes the Native blocks. (The HTTP interface is a different code path that does honour the FORMAT clause; HTTP is out of scope here.)

Security

Transport security (TLS)

TLS lives at the transport layer, below the protocol. When it is enabled the entire TCP stream is encrypted, and the protocol messages are byte-for-byte identical whether TLS is in use or not.

Authentication

Authentication happens during the handshake, in the ClientHello message. The user and password fields travel as plaintext strings, so transport-level encryption (TLS) is what protects credentials in transit.

SSH challenge-response authentication is available from protocol version 54466 onward — see SSH challenge-response authentication.

Inter-server secret

For distributed query execution, servers authenticate to one another by proving knowledge of a shared secret — without putting the secret on the wire. Each Query carries a 32-byte SHA-256 auth_hash in Query field 4, computed over a salt, nonce, the configured secret, and the query, which the receiving server recomputes and compares. This is gated by the INTERSERVER_SECRET feature (v54441). External clients always send an empty string here. See Inter-server authentication.

Versioning and feature gates

Version negotiation

Both client and server declare their maximum supported protocol version during the handshake. The negotiated version is the smaller of the two:

negotiated_version = min(client_version, server_version)

Every message after that uses the negotiated version to decide which fields are present on the wire.

Feature gates

A feature is identified by the protocol version that introduced it, and is active when the negotiated version is greater than or equal to that number.

Note

When a feature is active, its fields must be present on the wire. The protocol is strictly positional, so omitting a feature-gated field corrupts the byte stream for every field that follows.

Feature table

FeatureVersionAffectsWire impact
BLOCK_INFOallBlockAdds the BlockInfo prefix (is_overflows, bucket_number) to every Block.
CLIENT_INFO54032QueryAdds the ClientInfo block to the Query body.
TIMEZONE54058ServerHelloAdds the timezone field to ServerHello.
QUOTA_KEY_IN_CLIENT_INFO54060ClientInfoAdds the quota_key field to ClientInfo.
DISPLAY_NAME54372ServerHelloAdds the display_name field to ServerHello.
VERSION_PATCH54401ServerHello, ClientInfoAdds the version_patch field to both.
SERVER_LOGS54406LogServer emits Log packets when send_logs_level is set.
COLUMN_DEFAULTS_METADATA54410TableColumnsServer may send the TableColumns packet (type 11) with column-default metadata before the INSERT/input schema block. Sent only when negotiated version ≥ 54410 and input_format_defaults_for_omitted_fields is enabled. Below this version the packet is never sent; clients must not wait for it.
WRITE_CLIENT_INFO54420ProgressAdds wrote_rows and wrote_bytes to Progress. (Despite the name, this does not gate the ClientInfo block — that is CLIENT_INFO (v54032).)
SETTINGS_SERIALIZED_AS_STRINGS54429Query (settings encoding)Changes how the always-present settings list is encoded; does not gate whether settings are sent. v54429+ writes each setting as (name, flags, value-as-string); older peers write (name, type-specific-binary-value) with no flags. See Setting.
INTERSERVER_SECRET54441QueryAdds the inter-server auth_hash field to Query — a salted SHA-256 over the cluster secret, not the raw secret. External clients send an empty string. See Inter-server authentication.
OPEN_TELEMETRY54442ClientInfoAdds the OpenTelemetry trace context to ClientInfo.
DISTRIBUTED_DEPTH54448ClientInfoAdds the distributed_depth field to ClientInfo.
INITIAL_QUERY_START_TIME54449ClientInfoAdds the initial_time field (Int64, fixed-width).
PROFILE_EVENTS54451ProfileEventsServer emits ProfileEvents packets during query execution.
PARALLEL_REPLICAS54453ClientInfoAdds parallel-replica coordination fields to ClientInfo.
CUSTOM_SERIALIZATION54454Block (Column)Adds the has_custom_serialization byte after each column's type string.
ADDENDUM54458HandshakeClient sends an addendum (quota_key) after the handshake exchange.
PARAMETERS54459QueryAdds the parameters list to the Query body.
SERVER_QUERY_TIME_IN_PROGRESS54460ProgressAdds the elapsed_ns field to Progress.
PASSWORD_COMPLEXITY_RULES54461ServerHelloAdds a list of password-policy regex patterns and human-readable messages to ServerHello.
INTERSERVER_SECRET_V254462ServerHelloAdds an 8-byte UInt64 nonce to ServerHello. Used by inter-server query signing; external clients decode and ignore.
TOTAL_BYTES_IN_PROGRESS54463ProgressAdds the total_bytes_to_read (VarUInt) field to Progress, between total_rows and wrote_rows.
TIMEZONE_UPDATES54464TimezoneUpdateAdds the TimezoneUpdate server packet (type 17). Body: single String carrying the session timezone. Sent only by the input table function initializer, right after the input-schema block, so the client parses the rows it sends with the server's session_timezone. See TimezoneUpdate.
SPARSE_SERIALIZATION54465Block (Column)Server may set has_custom_serialization = 1 and emit a sparse-encoded column. Wire format: 1-byte kind (0x01 = SPARSE), then VarUInt offset stream terminated by EOG, then the non-default values densely encoded in the inner type. See kind_stack and sparse encoding.
SSH_AUTHENTICATION54466Auth flowAdds SSH challenge-response authentication. Opt-in: client sends a user of the form " SSH KEY AUTHENTICATION " + <real_user> with empty password to trigger it. See SSH challenge-response authentication.
TABLE_READ_ONLY_CHECK54467TablesStatusResponseAdds an is_readonly flag to each table's row in TablesStatusResponse. External clients that don't issue TablesStatusRequest see no wire change.
SYSTEM_KEYWORDS_TABLE54468system tablesServer populates system.keywords so the canonical clickhouse-client can autocomplete keywords. No native-protocol wire change.
ROWS_BEFORE_AGGREGATION54469ProfileInfoAdds applied_aggregation (Bool) and rows_before_aggregation (VarUInt) to ProfileInfo, in that order at the tail.
CHUNKED_PROTOCOL54470Connection framingPer-packet chunk framing wraps every packet body. Negotiated in Addendum. ServerHello carries the server's preference for each direction; Addendum carries the client's final choice. See chunked framing.
VERSIONED_PARALLEL_REPLICAS_PROTOCOL54471ServerHello, AddendumBoth sides exchange a VarUInt parallel-replicas coordination protocol version. ServerHello's field is positioned immediately after protocol_version (before timezone). Addendum's field is appended after the chunked-protocol strings. Current value: 7 (DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION).
INTERSERVER_EXTERNALLY_GRANTED_ROLES54472QueryAdds a String external_roles field to the Query body, between the settings terminator and the interserver-secret hash. External clients send an empty role list (a single byte 0x00, i.e. VarUInt 0 inside a String envelope).
V2_DYNAMIC_AND_JSON_SERIALIZATION54473Column bodyServer may emit V2 serialization for Dynamic and JSON column types — gates which state_prefix version they use. See versioned types.
SERVER_SETTINGS54474ServerHelloServer broadcasts its non-default settings as a list at the tail of ServerHello, after nonce. Format: (key, flags, value) triples terminated by an empty key — same as the Query packet's settings list.
QUERY_AND_LINE_NUMBERS54475ClientInfoAdds script_query_number (VarUInt) and script_line_number (VarUInt) at the tail of ClientInfo. Used by clickhouse-client for multi-statement script error attribution; external clients send 0, 0.
JWT_IN_INTERSERVER54476ClientInfoAdds a JWT-presence UInt8 + optional String jwt at the tail of ClientInfo. External clients (no JWT) send byte 0x00. (Spelled DBMS_MIN_REVISON_WITH_JWT_IN_INTERSERVER in C++ — note the typo in the constant name.)
QUERY_PLAN_SERIALIZATION54477ServerHello, QueryPlan packetServerHello appends VarUInt query_plan_serialization_version after server settings. Also introduces ClientPacket::QueryPlan (code 13) for inter-server delivery of pre-built query plans — external clients never send.
PARALLEL_BLOCK_MARSHALLING54478Block (Column)Server may wrap columns in ColumnBLOB (compressed inline) for parallel processing. Gated on the query having compression enabled AND rows > 1; otherwise the regular column wire format applies. Clients that never enable compression on outgoing Query packets see no wire change.
VERSIONED_CLUSTER_FUNCTION_PROTOCOL54479ServerHelloAdds VarUInt cluster_function_protocol_version at the tail of ServerHello. Used for *Cluster table functions (s3Cluster, etc.). External clients decode and ignore.
OUT_OF_ORDER_BUCKETS_IN_AGGREGATION54480BlockInfoAdds field 3 (out_of_order_buckets: Vec<Int32>) to BlockInfo's field-tagged stream. Decoded as [VarUInt count][Int32]*count. External clients don't emit this themselves; the decoder reads any non-empty list the server sends.
COMPRESSED_LOGS_PROFILE_EVENTS_COLUMNS54481Log, ProfileEvents, TableColumnsServer may wrap Log, ProfileEvents, and TableColumns packet bodies in the compression frame. At this version all three bodies travel through the same optionally-compressed output path, which becomes a real compression frame only when the query has compression = true. Clients that never enable compression on outgoing Query packets see no wire change.
REPLICATED_SERIALIZATION54482Block (Column)Server may emit columns with kind_stack 0x04 = REPLICATED — a dictionary-style compact form for repeated values — see kind_stack and sparse encoding. Below this version the writer expanded such columns before sending. Decoded via index lookup (elements[indexes[i]] per row); leaf types plus Nullable/Array/Tuple/Map/Nested/LowCardinality inners supported.
NULLABLE_SPARSE_SERIALIZATION54483Block (Column)Composes sparse serialization with Nullable(T). Below this version the writer expanded sparse for Nullable columns before sending; at v54483+ the wire data is sparse-over-Nullable. See kind_stack and sparse encoding.
PROGRESS_IN_ASYNC_INSERT54484Progress (INSERT)On an asynchronous INSERT (async_insert = 1), once the insert is flushed the server sends an extra Progress packet, then the insert's ProfileEvents, before EndOfStream. Gated on the negotiated version ≥ 54484; below it the server omits this trailing Progress. The Progress wire format is unchanged — only the emission is new. In practice the increment carries the elapsed time; the written-row counters are reported via the accompanying ProfileEvents. A client that already drains interleaved Progress needs no format change, only to tolerate one more packet.
CLIENT_AGENT_IN_CLIENT_INFO54485ClientInfoAdds a trailing client_agent String to ClientInfo. The canonical client auto-detects an agent identifier from its environment (for example claude-code, cursor, gemini-cli, or the value of the AGENT variable); an external client with nothing detected sends an empty string. Required once negotiated version ≥ 54485 — omitting it desynchronizes the rest of the Query packet.

Packet envelope

Every message on the wire shares the same outer structure, in both directions:

[VarUInt: packet_type_code]    always encoded as VarUInt
[message body]                 format depends on packet_type_code

The full packet-type tables are in the packet type reference.

The packet type is a VarUInt, not a fixed-width byte. For values below 128 a VarUInt produces the same single byte, but implementations must use VarUInt encoding so they stay compatible should future packet types reach 128 or above.

The message reference documents only the body of each packet — the bytes after the packet type code. Field numbering starts at 1 with the first body field.

Chunked framing (v54470+)

When the CHUNKED_PROTOCOL feature is negotiated (see the handshake), every packet on the wire is wrapped in chunked framing. The wrapping is per-direction: client→server and server→client are negotiated separately and may end up in different modes (chunked versus unframed).

Wire layout per packet:

<chunk>...   one or more chunks; their payloads concatenated form the whole packet
[u32 LE = 0] zero-size terminator marking end of packet

Wire layout per chunk:

[u32 LE: chunk_size]   chunk_size in [1, UINT32_MAX]
[chunk_size bytes]     packet bytes (see note below)

The packet type VarUInt is inside the chunked stream: it is the first byte of the packet payload (the first byte of the first chunk), not a separate byte sent ahead of the framing. Each packet's chunk payload is the full [VarUInt packet_type_code][message body] from the packet envelope. A client that leaves the packet type outside the chunked stream makes the peer read that type byte as the first byte of the u32 chunk size, desynchronizing the connection.

A single packet may be split across several chunks if the writer's buffer fills mid-packet; a split can fall anywhere, including inside the packet type's VarUInt. The reader concatenates chunk payloads and treats the trailing 4-byte zero as a transparent packet boundary — it consumes it but does not surface it to whatever is reading packet bodies.

No-body packets are still wrapped: a single-byte packet such as Ping or Pong becomes [u32 size = 1][0x04][u32 0] once chunking is negotiated. Any "single byte on the wire" description elsewhere on this page is the pre-chunking form.

Negotiation. ServerHello and Addendum each carry two String fields, one per direction, with values drawn from {"chunked", "notchunked", "chunked_optional", "notchunked_optional"}:

  • chunked / notchunked are strict: that side requires exactly that mode.
  • The _optional variants are flexible: they accept whichever mode the other side picks.

The agreed value for each direction is computed pairwise:

Server prefClient prefAgreed
*_optionalanythingfollow CLIENT (its starts_with("chunked"))
anything*_optionalfollow SERVER
chunked strictchunked strictchunked
notchunked strictnotchunked strictnotchunked
strict mismatchstrict mismatchprotocol error — the connection MUST be torn down

On the client side, the client's SEND preference negotiates against the server's RECV preference, and vice versa.

Timing. The negotiation strings travel on the unframed wire: ClientHello → ServerHello (server prefs) → Addendum (client's negotiated values). The framing flip applies to every byte sent after the Addendum is flushed. The Addendum itself, the ClientHello, and the ServerHello are always unframed.

Connection lifecycle

At any moment a connection is in exactly one of four states: HANDSHAKE, READY, READING_RESPONSE, or terminated. Because the protocol does not multiplex, a client that sends a new request before draining the previous response interleaves bytes on the wire and corrupts the stream.

States

The happy path runs straight down — HANDSHAKE → READY → READING_RESPONSE → READY — with the Ping/Pong self-loop and every failure edge funnelling into the single Terminated sink.

StateDescription
HANDSHAKEInitial state after the TCP connection opens. Only handshake messages are valid. Transitions to READY on success or terminates on failure.
READYIdle. The client may send Ping, Query, or close. The connection may stay in READY indefinitely (subject to idle_connection_timeout, see connection limits).
READING_RESPONSEEntered when the client sends a Query. The client must fully drain the server's response stream before returning to READY. The only allowed client→server packet here is Cancel (not specified on this page).
TerminatedNo longer usable. The client must open a new TCP connection and restart the handshake.

Handshake phase

Authenticates and negotiates the protocol version. Happens exactly once per connection, before anything else.

The TCP connection has just opened and no messages have been exchanged. The flow:

  1. The client sends ClientHello with its maximum supported protocol version.

  2. The client reads the response and dispatches by packet type:

    Packet typeAction
    Hello (0)Decode ServerHello. Compute negotiated_version = min(client_ver, server_ver). Proceed to step 3.
    Exception (2)Decode Exception. Return as error and terminate the connection.
    anything elseProtocol violation. Terminate the connection.
  3. If negotiated_version ≥ 54458 (the ADDENDUM feature), the client sends an Addendum. This decision is based on the negotiated version, not the client's declared version.

On success the connection moves to READY; on any error it terminates.

Ping phase

An application-level liveness check, independent of TCP keepalive. A successful Ping/Pong round-trip confirms the TCP connection is alive in both directions and the server is responsive. Ping is stateless and uncorrelated with any query, so multiple sequential Pings are independent.

Starting from READY, the flow is:

  1. The client sends Ping.

  2. The client reads the response:

    Packet typeAction
    Pong (4)Liveness confirmed. Return to READY.
    Exception (2)Decode Exception and return as error.
    anything elseProtocol violation.

Query phase

The client submits a SQL statement; the server streams back result blocks and execution telemetry. The response is a sequence of packets terminated by exactly one EndOfStream or Exception.

Starting from READY, the flow is:

On error at any point the server sends an Exception instead of EndOfStream, which terminates the query.

  1. The client sends Query with a unique query_id (typically a UUID).

  2. The client sends any external tables, then the empty Data marker. The empty Data packet has table_name = "", num_columns = 0, num_rows = 0. The server does not begin executing the query until it receives this marker.

  3. The client moves to READING_RESPONSE and flushes its write buffer.

  4. The client reads response packets in a loop, dispatching by type:

    Packet typeAction
    Data (1)Decode the block. The first Data is the schema header; later ones are result blocks (accumulate); an empty block is a boundary marker. num_rows == 0 is not end-of-query.
    Progress (3)Execution metrics. Each packet is an increment since the previous one — accumulate locally.
    EndOfStream (5)Query complete. Exit the loop and return to READY.
    ProfileInfo (6)Post-execution profiling data.
    Totals (7)Aggregation totals block (same wire format as Data).
    Extremes (8)Min/max values block (same wire format as Data).
    Log (10)Server log line.
    TableColumns (11)Column-defaults metadata.
    ProfileEvents (14)Performance counters.
    Exception (2)Decode and return as error. Exit the loop and return to READY.
    anything elseUnexpected during the query phase. Terminate the connection.

On EndOfStream or a handled Exception the connection returns to READY. A protocol violation or I/O error terminates it.

Note

The num_rows == 0 case trips up new implementations. A zero-row block is a boundary marker or schema header, not an end-of-stream signal. Only EndOfStream or Exception ends the response.

INSERT phase

The INSERT phase is the Query phase with two extra exchanges. The client submits an INSERT statement; the server replies with a schema block describing the target table; the client streams Data packets with the rows, then the empty Data marker; the server finishes with EndOfStream or Exception.

Starting from READY, the SQL is an INSERT of the form INSERT INTO <table> [(<cols>)] VALUES — with no inline VALUES (...) literal, since the row data flows through Data packets. The flow:

  1. The client sends Query with body set to the INSERT SQL.
  2. The client sends any external tables (rare for INSERT). Unlike the Query phase, it does not send an empty Data marker here. The INSERT Query packet is sent with pending data, so the empty end-of-data block is deferred to step 5; sending it before the schema block would make the server read it as the end of the row stream, finish the INSERT with no rows, and then parse the first real row packet as a stray top-level packet.
  3. The client drains metadata packets (TableColumns, Progress, ProfileInfo, Log, ProfileEvents) until it reads the schema Data packet — a Block with 0 rows but full column structure (names and types). The schema block is the contract: the rows the client sends next must match these column shapes.
  4. The client sends data block(s). For each block it writes VarUInt(ClientPacket::Data = 2), then String("") for the empty external-table name, then the Block. Column types must align with the schema block's columns by position.
  5. The client sends the end-of-input terminator: a Data packet with an empty Block (0 columns, 0 rows).
  6. The client drains the response stream until EndOfStream (success) or Exception (failure).

Asynchronous INSERT (v54484+). When the query carries async_insert = 1, the server queues the rows and flushes them as part of a batch. At negotiated version ≥ 54484 (PROGRESS_IN_ASYNC_INSERT), once the flush completes the server emits an extra Progress packet, immediately followed by the insert's ProfileEvents, then EndOfStream. Below 54484 the server skips that trailing Progress. The packet is an ordinary Progress; because the server resets the query pipeline before folding in the write counts, the increment in practice carries only the elapsed time, and the written-row and byte stats reach the client via the accompanying ProfileEvents. A client that already drains interleaved Progress in step 6 needs only to accept one more packet.

The connection returns to READY on EndOfStream or a handled Exception. Protocol violations and I/O errors terminate it.

Message reference

Fields are listed in wire order. The Type column uses:

  • VarUInt — variable-length unsigned integer (see VarUInt).
  • String — VarUInt-prefixed bytes (see String).
  • UInt8, Int32, and so on — fixed-width little-endian integers.
  • Bool — a single byte, 0x00 or 0x01.

The Role column says who uses each field:

  • client — set by external clients.
  • inter-server — meaningful only for server-to-server communication; external clients write a default value.
  • universal — used by both.

These tables document only the body of each packet, after the packet type code.

ClientHello (packet type 0)

Client → Server. The first message after the TCP connection opens.

#FieldTypeRoleDescription
1client_nameStringuniversalClient identifier (e.g., "clickhouse-client")
2version_majorVarUIntuniversalClient major version
3version_minorVarUIntuniversalClient minor version
4protocol_versionVarUIntuniversalClient's max supported protocol version
5databaseStringuniversalDefault database name
6userStringuniversalUsername for authentication
7passwordStringuniversalPassword (plaintext)

ServerHello (packet type 0)

Server → Client. The reply to ClientHello on successful authentication.

#FieldTypeRoleConditionDescription
1server_nameStringuniversalalwaysServer identifier
2version_majorVarUIntuniversalalwaysServer major version
3version_minorVarUIntuniversalalwaysServer minor version
4protocol_versionVarUIntuniversalalwaysServer's protocol version
4aparallel_replicas_protocol_versionVarUIntuniversalVERSIONED_PARALLEL_REPLICAS_PROTOCOL (v54471)Server's parallel-replicas coordination protocol version. Wire position: immediately after protocol_version, before timezone. Current: 7.
5timezoneStringuniversalTIMEZONE (v54058)Server timezone (e.g., "UTC")
6display_nameStringuniversalDISPLAY_NAME (v54372)Human-readable server name
7version_patchVarUIntuniversalVERSION_PATCH (v54401)Server patch version
8proto_send_chunked_srvStringuniversalCHUNKED_PROTOCOL (v54470)Server's preferred outbound chunking. One of "chunked", "notchunked", "chunked_optional", "notchunked_optional". See chunked framing. Sits BEFORE password_complexity_rules on the wire even though its version gate is higher.
9proto_recv_chunked_srvStringuniversalCHUNKED_PROTOCOL (v54470)Server's preferred inbound chunking. Same value set as field 8.
10password_complexity_rulesRule[]universalPASSWORD_COMPLEXITY_RULES (v54461)Server's password policy. VarUInt count followed by count × Rule. See below.
11nonceUInt64inter-serverINTERSERVER_SECRET_V2 (v54462)8-byte LE random nonce. The server's inter-server query-signing scheme uses it. External clients MUST decode it (to keep the stream aligned) and SHOULD ignore the value.
12server_settingsSetting[]universalSERVER_SETTINGS (v54474)Server's non-default settings broadcast. Format: zero or more (String key, VarUInt flags, String value) triples, terminated by an empty key. Same as the Query packet's settings list.
13query_plan_serialization_versionVarUIntuniversalQUERY_PLAN_SERIALIZATION (v54477)Server's supported query-plan serialization version. External clients decode and ignore.
14cluster_function_protocol_versionVarUIntuniversalVERSIONED_CLUSTER_FUNCTION_PROTOCOL (v54479)Server's *Cluster table-function protocol version. External clients decode and ignore.

Rule — an element of password_complexity_rules:

#FieldTypeDescription
1patternStringRegular-expression pattern that a compliant password must match.
2messageStringHuman-readable explanation shown when a password fails this rule.

The list reflects the server operator's password-policy configuration and is purely advisory — the server does not enforce these rules during the handshake. A client that exposes password change/set functionality may use the rules to flag errors before round-tripping a non-compliant password to the server.

Note

To bound resource use against a hostile or misconfigured server, cap the decoded count at 256 entries and each pattern and message String at 4096 bytes. A count of 0 (no following pairs) is the common case for servers with no password policy configured.

Addendum (no packet type)

Client → Server, gated by ADDENDUM (v54458). Sent immediately after the handshake exchange completes. It is not a distinct packet type — the fields go on the wire raw, with no packet type byte prefix.

#FieldTypeRoleConditionDescription
1quota_keyStringuniversalalwaysResource quota key for server-side keyed quotas. Clients that do not use a keyed quota send an empty string.
2proto_send_chunkedStringuniversalCHUNKED_PROTOCOL (v54470)Client's negotiated outbound chunking: "chunked" or "notchunked". Computed against proto_recv_chunked_srv from ServerHello.
3proto_recv_chunkedStringuniversalCHUNKED_PROTOCOL (v54470)Client's negotiated inbound chunking. Computed against proto_send_chunked_srv.
4parallel_replicas_protocol_versionVarUIntuniversalVERSIONED_PARALLEL_REPLICAS_PROTOCOL (v54471)Client's supported parallel-replicas coordination protocol version. External clients not participating in distributed queries SHOULD still send a valid version (current 7) so the server's compatibility check succeeds.

The chunked-framing flip applies after this Addendum is flushed — the Addendum itself is unframed.

Ping (packet type 4)

Client → Server. No body — the packet is a single byte 0x04 before chunked framing; when chunking is negotiated the byte becomes the one-byte payload of a chunk (see chunked framing).

Pong (packet type 4)

Server → Client. No body — the packet is a single byte 0x04 before chunked framing; when chunking is negotiated the byte becomes the one-byte payload of a chunk (see chunked framing).

Exception (packet type 2)

Server → Client. Sent when the server hits an error during any phase.

#FieldTypeRoleDescription
1codeInt32universalError code
2nameStringuniversalException class (e.g., "DB::Exception")
3messageStringuniversalHuman-readable error message
4stack_traceStringuniversalServer-side stack trace
5has_nested (obsolete)BooluniversalObsolete compatibility byte. Always written as false by the server

Query (packet type 1)

Client → Server.

#FieldTypeRoleConditionDescription
1query_idStringuniversalalwaysUnique query identifier (UUID)
2client_infoClientInfouniversalCLIENT_INFO (v54032)See ClientInfo
3settingsSetting[]universalalwaysSee Setting. Always present (terminated by an empty key); only the per-setting encoding is version-gated — see the encoding note in Setting. A client must not omit this field for negotiated versions below 54429.
3aexternal_rolesStringuniversalINTERSERVER_EXTERNALLY_GRANTED_ROLES (v54472)Serialized list of externally-granted role names. Empty list = byte 0x00 (VarUInt 0) wrapped in a String envelope ([VarUInt 1][0x00] on the wire). External clients always send empty.
4auth_hashStringinter-serverINTERSERVER_SECRET (v54441)Inter-server authentication hash — not the raw cluster secret. See Inter-server authentication below. External clients (and any InitialQuery) send an empty string.
5stageVarUIntuniversalalwaysQuery processing stage. 0 = FetchColumns, 1 = WithMergeableState, 2 = Complete, 3 = WithMergeableStateAfterAggregation, 4 = WithMergeableStateAfterAggregationAndLimit, 7 = QueryPlan. Values 3/4 appear in distributed queries; 7 accompanies a serialized query plan. External clients normally send 2.
6compressionVarUIntuniversalalways0 = disabled, 1 = enabled
7query_bodyStringuniversalalwaysSQL text
8parametersParameter[]clientPARAMETERS (v54459)See Parameter. Terminated by empty key.

ClientInfo (embedded in Query)

Client → Server, embedded in the Query body (field 2). Gated by CLIENT_INFO (v54032). (Some fields inside ClientInfo are gated by later versions, as noted per-field below.)

#FieldTypeRoleConditionDescription
1query_kindUInt8universalalways0 = NoQuery, 1 = InitialQuery, 2 = SecondaryQuery. External clients send 1.
2initial_userStringuniversalalwaysUser who initiated the query
3initial_query_idStringuniversalalwaysOriginal query ID
4initial_addressStringuniversalalwaysOriginating client socket address in host:port format
5initial_timeInt64clientINITIAL_QUERY_START_TIME (v54449)Query start time (microseconds). Fixed-width 8 bytes, not VarUInt
6query_interfaceUInt8universalalways1 = TCP, 2 = HTTP
7os_userStringclientif interface = TCPOS username
8client_hostnameStringclientif interface = TCPClient machine hostname
9client_nameStringclientif interface = TCPClient application name
10version_majorVarUIntuniversalif interface = TCPClient major version
11version_minorVarUIntuniversalif interface = TCPClient minor version
12protocol_versionVarUIntuniversalif interface = TCPThe originating client's own TCP protocol version (DBMS_TCP_PROTOCOL_VERSION), not the negotiated version. The peer revision only decides which fields are present; this value is the initiator's compiled-in version, so on a newer client talking to an older server it can be higher than the negotiated/server revision.
13quota_keyStringuniversalQUOTA_KEY_IN_CLIENT_INFO (v54060)Resource quota key for server-side keyed quotas. Clients that do not use a keyed quota send an empty string.
14distributed_depthVarUIntinter-serverDISTRIBUTED_DEPTH (v54448)Distributed query nesting depth. External clients send 0.
15version_patchVarUIntuniversalVERSION_PATCH (v54401), TCP onlyClient patch version
16open_telemetry(below)clientOPEN_TELEMETRY (v54442)Trace context. Clients without tracing send 0.
17collaborate_with_initiatorVarUIntinter-serverPARALLEL_REPLICAS (v54453)Bool as VarUInt. External clients send 0.
18count_participating_replicasVarUIntinter-serverPARALLEL_REPLICAS (v54453)External clients send 0.
19number_of_current_replicaVarUIntinter-serverPARALLEL_REPLICAS (v54453)External clients send 0.
20script_query_numberVarUIntclientQUERY_AND_LINE_NUMBERS (v54475)1-indexed statement position in a multi-statement script. External clients send 0.
21script_line_numberVarUIntclientQUERY_AND_LINE_NUMBERS (v54475)1-indexed line number within the source script. External clients send 0.
22jwt_presentUInt8inter-serverJWT_IN_INTERSERVER (v54476)0 = no JWT; 1 = JWT follows. External clients without JWT auth send 0.
23jwtStringinter-serverJWT_IN_INTERSERVER (v54476), if jwt_present=1JWT bearer token, only present when field 22 = 1.
24client_agentStringclientCLIENT_AGENT_IN_CLIENT_INFO (v54485)Trailing field. Identifier of the client tool/agent, auto-detected from the environment (e.g. claude-code, cursor, gemini-cli, or the AGENT env var). External clients with no detected agent send an empty string. Present on the normal Query path once negotiated version ≥ 54485 (sent on all interfaces, not only TCP).
Interface-dependent layout (fields 7–12)

Fields 7–12 above are the TCP branch. When query_interface (field 6) is not TCP, these fields are replaced by a different wire layout — they are not merely optional omissions, so a decoder must branch on field 6.

  • query_interface = 2 (HTTP): the server-forwarded HTTP request info is written instead — http_method (UInt8), http_user_agent (String), then forwarded_for (String, gated by X_FORWARDED_FOR_IN_CLIENT_INFO v54443) and http_referer (String, gated by REFERER_IN_CLIENT_INFO v54447). No os_user/client_hostname/client_name/version_*/protocol_version fields are present.
  • Any other interface: none of the TCP fields (7–12) and none of the HTTP fields are written; the stream continues directly with quota_key.

After this branch the layout rejoins: quota_key (field 13) and distributed_depth (field 14) follow for all interfaces, then version_patch (field 15) is written only for TCP.

This branch matters mainly for inter-server traffic, where the initiating server forwards a query that originally arrived over HTTP. A decoder that always reads the TCP fields will misread such packets — treating http_method or http_user_agent as quota_key.

OpenTelemetry encoding (field 16):

[UInt8: has_trace]              0 = no trace data follows, 1 = trace data follows
If has_trace == 1:
  [16 bytes: trace_id]          byte-swapped per-8-bytes
  [8 bytes:  span_id]           byte-swapped
  [String:   trace_state]       W3C trace state
  [UInt8:    trace_flags]       W3C trace flags

Inter-server authentication

The Query field 4 (auth_hash) is not the shared cluster secret on the wire. Sending the raw secret would both fail authentication and leak it. Instead, a server acting as an inter-server client proves it knows the secret with a salted SHA-256 hash:

  1. Enter inter-server mode. The connecting server signals it inside ClientHello: the user field is the inter-server marker and password is empty. It then appends two more strings — the cluster name and a freshly-generated 32-byte salt (encodeSHA256 of a random value) — immediately after the user/password fields, as part of the same ClientHello packet. The server reads these two strings before it sends ServerHello, so a client must write them up front; waiting for ServerHello first deadlocks, because the server is blocked reading them.
  2. Obtain the nonce. ServerHello carries an 8-byte UInt64 nonce when INTERSERVER_SECRET_V2 (v54462) is negotiated.
  3. Compute the hash. For every non-InitialQuery Query packet, the client writes encodeSHA256(salt + nonce + cluster_secret + query + query_id + initial_user + external_roles) into field 4 — a 32-byte digest. (nonce is its decimal string form, present only when negotiated ≥ v54462; external_roles is appended only when INTERSERVER_EXTERNALLY_GRANTED_ROLES (v54472) is negotiated.) For an InitialQuery, or when no cluster secret is configured, the client writes an empty string instead.
  4. Verify. The server reads field 4 with a 32-byte cap and recomputes the same concatenation using its own copy of the cluster secret; the connection is rejected if the digests differ.

External (non-inter-server) clients never enter this mode and always send an empty auth_hash.

Setting

Encoded inline in the Query body's settings list (the Query packet, field 3). The list is always present, regardless of negotiated version, and is terminated by a Setting with an empty key — a single VarUInt 0, with no flags or value following. Only the per-setting encoding depends on the negotiated version, gated by SETTINGS_SERIALIZED_AS_STRINGS (v54429).

v54429+ (STRINGS_WITH_FLAGS) — each setting is the triple shown here:

#FieldTypeRoleDescription
1keyStringuniversalSetting name. Empty = end of list.
2flagsVarUIntuniversalMetadata bit flags; see below.
3valueStringuniversalSetting value as string

Fields 2 and 3 are absent when key is empty.

Pre-54429 (BINARY) — each setting is [String key][type-specific binary value]: the flags field is not written, and the value is encoded in the setting's native binary form (for example a fixed-width integer or a length-prefixed string) rather than as a decimal/text string. The list is still terminated by an empty key. A client that targets a negotiated version below 54429 must read and write this binary form, not the triple above. (Custom user-defined settings are the exception: they always carry flags and a string value, in both encodings.)

The flags field packs:

  • 0x01Important: the setting affects query results and must not be silently ignored by older peers.
  • 0x02Custom: a user-defined custom setting.
  • 0x0c — a 2-bit tier field, not an independent flag: 0x00 = Production, 0x04 = Obsolete, 0x08 = Experimental, 0x0c = Beta. Read all 2 bits (flags & 0x0c) — a naive flags & 0x04 test would misclassify Beta (0x0c) as Obsolete.
  • 0x80HotReload (config reload without restart; defined in the flags enum, encountered mainly for coordination settings).

Parameter

Query parameters, for parameterized queries like SELECT {x:UInt64}. Encoded identically to a Setting with the Custom flag (0x02) set, and terminated by an empty key in the same way.

#FieldTypeRoleDescription
1keyStringclientParameter name. Empty = end of list.
2flagsVarUIntclientAlways 0x02 (Custom)
3valueStringclientParameter value as string. See the note below on quoting.
Note

The parameter value is the SQL representation of the value, not a raw literal. String-typed parameters must be passed already single-quoted (for example, the value for {name:String} is 'Alice', not Alice); otherwise the server's value parser rejects them.

Data (packet type 1 server→client, packet type 2 client→server)

Both directions. Carries result blocks, INSERT data, external tables, and end-of-data markers.

The wire format is symmetric — both directions include a table_name prefix before the Block. Only the packet type byte differs.

[VarUInt: packet_type]     1 (server→client) or 2 (client→server)
[String:  table_name]      External table name; empty in most cases
[Block]                    See the Native Format spec for the Block layout
FieldTypeRoleDescription
table_nameStringuniversalExternal table name. Empty ("") is the common case — for the main table, query results, and the INSERT row stream. Empty table_name alone is not the end-of-data marker (normal INSERT row packets also carry "").
Block bodySee Block & column structure.

The end-of-data marker is a packet whose Block is empty — 0 columns and 0 rows — regardless of table_name. The server treats a client Data packet as the terminator only when the decoded block is empty (block.empty()); a packet with table_name = "" and a non-empty block is an ordinary row packet, not a terminator. So an INSERT row stream is a sequence of non-empty Data blocks followed by one empty Data block that ends it.

The block variants and what they mean are documented under Block variants.

Progress (packet type 3)

Server → Client. Sent periodically during query execution. All fields are VarUInt, and each packet carries increments since the previous Progress packet, not cumulative totals. Before sending, the server reads its counters and atomically resets them to zero, and computes elapsed_ns as the time delta since the last send. A client therefore must accumulate successive packets locally to obtain running totals — treating a packet as an absolute value makes the progress display jump backwards or undercount once more than one packet arrives.

#FieldTypeRoleConditionDescription
1rowsVarUIntuniversalalwaysRows read since the previous packet (add to running total)
2bytesVarUIntuniversalalwaysBytes read since the previous packet (add to running total)
3total_rowsVarUIntuniversalalwaysIncrement to the estimated total rows to read; accumulate (may be 0 in a given packet)
4total_bytesVarUIntuniversalTOTAL_BYTES_IN_PROGRESS (v54463)Increment to the estimated total bytes to read; accumulate. Sits BETWEEN total_rows and wrote_rows on the wire.
5wrote_rowsVarUIntuniversalWRITE_CLIENT_INFO (v54420)Rows written since the previous packet (for INSERT); accumulate
6wrote_bytesVarUIntuniversalWRITE_CLIENT_INFO (v54420)Bytes written since the previous packet (for INSERT); accumulate
7elapsed_nsVarUIntuniversalSERVER_QUERY_TIME_IN_PROGRESS (v54460)Nanoseconds elapsed since the previous packet (a delta, not total query time); accumulate

ProfileInfo (packet type 6)

Server → Client. Sent once per query, near the end of execution.

#FieldTypeRoleConditionDescription
1rowsVarUIntuniversalalwaysTotal rows processed
2blocksVarUIntuniversalalwaysTotal blocks processed
3bytesVarUIntuniversalalwaysTotal bytes processed
4applied_limitBooluniversalalwaysWhether a LIMIT clause was applied
5rows_before_limitVarUIntuniversalalwaysRow count before LIMIT
6obsoleteBooluniversalalwaysObsolete compatibility byte. The server always writes true here and the client discards it on read; it is not a "rows_before_limit was computed" flag. The meaningful limit state is field 4 (applied_limit) together with field 5. Read and ignore.
7applied_aggregationBooluniversalROWS_BEFORE_AGGREGATION (v54469)Whether GROUP BY was applied
8rows_before_aggregationVarUIntuniversalROWS_BEFORE_AGGREGATION (v54469)Row count before aggregation

Totals (packet type 7)

Server → Client. Sent for queries with WITH TOTALS. Wire format is identical to Data: a table_name string (always empty) followed by a Block. Only the packet type byte differs.

[VarUInt: 7]                packet type
[String:  table_name]       always empty
[Block]                     see the Native Format spec

Extremes (packet type 8)

Server → Client. Sent when the extremes setting is enabled. Wire format is identical to Data. The block has exactly 2 rows: row 0 holds the minimum of each column, row 1 holds the maximum.

[VarUInt: 8]                packet type
[String:  table_name]       always empty
[Block]                     num_rows = 2

Log (packet type 10)

Server → Client. Sent when the query has an active logs queue (the send_logs_level setting; see log streaming).

Same envelope and body format as Data. The block has a fixed num_columns = 8 and a predefined schema. Each log line is one row across all 8 columns, and a single Log packet may carry many rows.

[VarUInt: 10]               packet type
[String:  table_name]       always empty
[Block]                     num_columns = 8, num_rows = number of log lines

The 8 columns, in this exact order:

#NameTypeDescription
1event_timeDateTimeEvent timestamp (seconds since epoch)
2event_time_microsecondsUInt32Microseconds component
3host_nameStringServer hostname emitting the log
4query_idStringQuery ID the log belongs to
5thread_idUInt64OS thread ID
6priorityInt8Log level (Poco priority: 1 = Fatal, … 8 = Trace)
7sourceStringLogger name
8textStringLog message text

ProfileEvents (packet type 14)

Server → Client. Carries per-query performance counters.

Same envelope and body format as Data. The block has a fixed num_columns = 6 and a predefined schema. Each event is one row.

[VarUInt: 14]               packet type
[String:  table_name]       always empty
[Block]                     num_columns = 6, num_rows = number of events

The 6 columns:

#NameTypeDescription
1host_nameStringServer hostname
2current_timeDateTimeEvent timestamp
3thread_idUInt64Thread ID
4typeEnum8Event type: 1 = Increment (counter), 2 = Gauge. The underlying storage is one signed byte.
5nameStringEvent name (e.g., "Query", "NetworkReceiveBytes")
6valueInt64Counter value or gauge reading
Note

The value column's element type is not fixed across packets — older servers emit UInt64, newer ones Int64. Read the column's type string from the block header rather than assuming one width.

TableColumns (packet type 11)

Server → Client, gated by COLUMN_DEFAULTS_METADATA (v54410). The server sends it before the INSERT schema block to carry column-default metadata, but only when the negotiated version is ≥ 54410 and the input_format_defaults_for_omitted_fields setting is enabled. Below 54410 the packet is never sent, so an older client must not wait for it — the schema Data block comes directly. A v54410+ client should be ready for either order: an optional TableColumns, then the schema block.

#FieldTypeRoleDescription
1external_tableStringuniversalExternal table name. Empty = main table.
2columns_descriptionStringuniversalTextual column definitions, e.g., "id Int32, name String DEFAULT ''". Free-form text — parse as a string.
Compressed body at v54481+

At negotiated version ≥ 54481 (COMPRESSED_LOGS_PROFILE_EVENTS_COLUMNS) the server writes both fields through the same optionally-compressed output path, so when the query has compression = true the whole TableColumns body (external_table + columns_description) is inside the compression frame; the client reads it through the matching decompressed stream. When the query has no compression, the body is on the wire uncompressed exactly as the table above shows. This matters for INSERT schema responses: a client that switches compression handling for Log and ProfileEvents but not TableColumns will misread the response when query compression is enabled.

TimezoneUpdate (packet type 17)

Server → Client, gated by TIMEZONE_UPDATES (v54464). Sent in exactly one place: the initializer for the input table function (a query of the form INSERT INTO <table> SELECT ... FROM input('<structure>'), which streams rows from the client). Right after the server sends the input-schema Data block (see the INSERT phase), it emits TimezoneUpdate carrying the query context's current session_timezone so the client parses the rows it is about to send with the same timezone. The server does not emit this packet for arbitrary mid-query SET session_timezone changes, nor to tell the client how to format later result blocks.

#FieldTypeRoleDescription
1timezoneStringuniversalThe new session-default timezone (e.g., "UTC", "Europe/Berlin").

The packet arrives once, immediately after the input-schema block and before the client starts sending row blocks. A decoder that ignores TimezoneUpdate MUST still consume the trailing String to keep the wire aligned.

SSH challenge-response authentication (packet types 11, 12, 18)

Gated by SSH_AUTHENTICATION (v54466), and opt-in only. A connection enters the SSH flow when ClientHello sends user = " SSH KEY AUTHENTICATION " + <real_user> (with the leading and trailing spaces) and password = "". The server reads the prefix, strips it to recover the real user, and switches to challenge-response.

PacketCodeDirectionBody
SSHChallengeRequest11Client → Server(no body)
SSHChallenge18Server → ClientString challenge — random bytes; one component of the string that gets signed (see below)
SSHChallengeResponse12Client → ServerString signature — SSH signature over the concatenation defined below, not over the raw challenge

The flow runs in place of password authentication, and the challenge-response exchange happens before ServerHello — the server defers its Hello reply until authentication succeeds:

  1. The client sends ClientHello with the SSH marker prefix and an empty password.

  2. The client sends SSHChallengeRequest (packet 11). The server has not sent ServerHello yet — it processes authentication first and blocks here waiting for this packet.

  3. The server replies with SSHChallenge carrying random bytes (packet 18).

  4. The client builds the string to sign and signs that, not the raw challenge, then sends SSHChallengeResponse (packet 12) with the signature. The signed message is the byte-wise concatenation, with no separators, of four parts in this exact order:

    to_sign = decimal(protocol_version) + default_database + user + challenge
    
    PartSource
    decimal(protocol_version)The client's protocol version as a decimal ASCII string (e.g. "54466") — the version number as a string, not a VarUInt or fixed-width integer. The server validates using the same protocol version it received in ClientHello.
    default_databaseThe database field from ClientHello (empty string if none).
    userThe real user name with the " SSH KEY AUTHENTICATION " marker prefix stripped — the same name the server recovers after stripping the prefix.
    challengeThe raw challenge bytes from the SSHChallenge packet.
  5. The server verifies the signature against the user's registered public key, reconstructing the same decimal(protocol_version) + default_database + user + challenge string. On success it sends ServerHello — the same reply as in the password flow — and the handshake continues normally (Addendum, etc.); on failure it returns an Exception and terminates the connection. A client that signs only the raw challenge bytes will fail authentication.

Note

This is the reverse of the password handshake, where ServerHello immediately follows ClientHello. Under SSH auth, ServerHello is withheld until after the signature is verified, so the SSH challenge-response is interleaved into the handshake before any ServerHello is seen.

External clients that don't use SSH auth never see packets 11, 12, or 18 — they stay off the wire unless the user explicitly opts in via the username prefix.

Packet type reference

Client → Server

CodeNameBody formatDescription
0HelloClientHelloHandshake initiation
1QueryQueryQuery execution request
2DataDataData block (INSERT data, external tables, end-of-data marker)
3Cancel(no body)Cancel running query
4PingPingLiveness check
5TablesStatusRequestnot specifiedTable status check
6KeepAlivenot specifiedConnection keepalive
7Scalarnot specifiedScalar data block
8IgnoredPartUUIDsnot specifiedParts to exclude from query
9ReadTaskResponsenot specifiedS3 cluster read response
10MergeTreeReadTaskResponsenot specifiedParallel read task response
11SSHChallengeRequestSSH authSSH auth challenge request
12SSHChallengeResponseSSH authSSH auth challenge response
13QueryPlannot specifiedQuery plan

Server → Client

CodeNameBody formatDescription
0HelloServerHelloHandshake response
1DataDataResult data block
2ExceptionExceptionError
3ProgressProgressQuery execution progress
4PongPongLiveness response
5EndOfStream(no body)Query complete
6ProfileInfoProfileInfoPost-execution profiling data
7TotalsTotalsGROUP BY WITH TOTALS row
8ExtremesExtremesMin/max values (2-row block)
9TablesStatusResponsenot specifiedTable status response
10LogLogQuery execution log lines
11TableColumnsTableColumnsColumn descriptions for defaults
12PartUUIDsnot specifiedUnique part IDs
13ReadTaskRequestnot specifiedCluster read task request
14ProfileEventsProfileEventsPerformance counters
15MergeTreeAllRangesAnnouncementnot specifiedParallel read initialization
16MergeTreeReadTaskRequestnot specifiedParallel read task assignment
17TimezoneUpdateTimezoneUpdateServer timezone update
18SSHChallengeSSH authSSH auth challenge

Configuration

This section covers the tunables that shape native protocol connections:

The defaults below reflect a recent server release; they may differ across versions and deployments.

Transport-layer settings

Socket options

OptionDefaultSideDescription
TCP_NODELAYonbothNagle's algorithm disabled. Small packets are sent immediately.
SO_KEEPALIVEon (client), OS default (server)asymmetricKernel-level TCP keepalive probes. Client explicitly enables this when tcp_keep_alive_timeout > 0. Server inherits the OS default.
SO_RCVBUF / SO_SNDBUFOS defaultsSocket buffer sizes. Not tuned by the protocol.

Timeouts

SettingDefaultUnitSideDescription
connect_timeout10secondsclientTimeout for establishing the initial TCP connection.
handshake_timeout_ms10000millisecondsclientTimeout for receiving ServerHello during the handshake.
send_timeout300secondsbothIf no bytes can be written within this interval, the connection throws.
receive_timeout300secondsbothIf no bytes can be read within this interval, the connection throws.
tcp_keep_alive_timeout290secondsclientIdle duration before the OS sends the first TCP keepalive probe.
receive_data_timeout_ms2000millisecondsclientTimeout for receiving the first Data packet from a replica.
connect_timeout_with_failover_ms1000millisecondsclientPer-attempt connect timeout when iterating replicas.
connect_timeout_with_failover_secure_ms1000millisecondsclientPer-attempt connect timeout when iterating replicas over TLS.
hedged_connection_timeout_ms50millisecondsclientPer-attempt connect timeout for hedged requests.
poll_interval10secondsserverGranularity of the server's idle-connection and shutdown check loop.

The timeouts nest like this:

tcp_keep_alive_timeout (290s)
      < receive_timeout (300s)
      < idle_connection_timeout (3600s)
      < tcp_close_connection_after_queries_seconds (0 = unlimited by default)

OS keepalive fires first and may detect dead peers silently at the kernel level. The application receive timeout is the next line of defence. The idle timeout is the last resort that reaps long-unused connections.

Connection limits

SettingDefaultUnitSideDescription
max_connections4096countserverMaximum concurrent TCP connections.
idle_connection_timeout3600secondsserverMaximum time an idle connection may remain open.
tcp_close_connection_after_queries_num0 (unlimited)countserverMaximum number of queries per connection before a forced close.
tcp_close_connection_after_queries_seconds0 (unlimited)secondsserverMaximum total connection lifetime regardless of activity.

A connection that issues queries regularly can live indefinitely. Only idle connections are reaped after an hour, and there is no default maximum lifetime.

Application-layer settings

These settings travel per-query in the Query packet's settings list. They change what the server sends on the wire, or how it is framed.

Compression

SettingDefaultUnitDescription
network_compression_method"LZ4"stringCompression codec used when the Query packet's compression flag is set. Values: "LZ4", "LZ4HC", "ZSTD", "NONE".
network_zstd_compression_level11–15ZSTD level when network_compression_method == "ZSTD".

The compression flag in the Query packet (field 6) toggles compression on and off; these settings select which codec is used when it is on.

Log streaming

SettingDefaultUnitDescription
send_logs_level"fatal"stringMinimum log level. Values: "none", "fatal", "error", "warning", "information", "debug", "trace".
send_logs_source_regexp""stringRegex filter on the logger source. Empty = all sources pass.

Setting send_logs_level to anything other than "none" makes the server emit Log packets during query execution.

Progress reporting

SettingDefaultUnitDescription
interactive_delay100000microsecondsTarget minimum interval between consecutive Progress packets.

This is a target minimum, not a strict maximum: the server may send Progress packets less often when the query is not producing work fast enough.

Result envelope

SettingDefaultUnitDescription
extremesfalseboolWhen true, the server sends an Extremes packet with min/max values per column.
max_result_rows0 (unlimited)countCap on rows transmitted. Behaviour controlled by result_overflow_mode.
max_result_bytes0 (unlimited)uncompressed bytesCap on uncompressed byte volume. Behaviour controlled by result_overflow_mode.
result_overflow_mode"throw"string"throw" ends the stream with Exception; "break" sends partial results followed by EndOfStream.

Async INSERT

SettingDefaultUnitDescription
async_inserttrueboolWhen true, INSERT data is queued server-side and batched.
wait_for_async_inserttrueboolWhen true (with async_insert on), the server holds the response until queued data is flushed.
wait_for_async_insert_timeout120secondsMaximum time the server waits for a flush before returning.

Distributed tracing

SettingDefaultUnitDescription
opentelemetry_start_trace_probability0.00–1 probabilityServer-side probability of attaching OpenTelemetry context to response telemetry.

Settings out of scope

These settings are sometimes mistaken for protocol-level settings, but they control SQL execution, storage, or CPU use rather than wire behaviour. A protocol implementation does not need to handle them specially.

  • max_threads — parallelism within query execution.
  • max_memory_usage — per-query memory cap.
  • max_block_size, preferred_block_size_bytes — internal block sizing during query processing; wire blocks are independent of these.
  • compile_expressions — JIT compilation; CPU only.
  • async_insert_max_data_size — server-side queue buffer.
  • All input_format_* and output_format_* settings except the input_format_native_* / output_format_native_* family — the non-native ones select or tune other formats (for example over HTTP) and do not change the native protocol's Data blocks.

The *_native_* settings are the exception: they change the bytes inside native TCP Data blocks, so a protocol implementation must account for them. output_format_native_encode_types_in_binary_format switches the column type field from a textual string to a binary type encoding, output_format_native_write_json_as_string emits JSON columns as a String, and output_format_native_use_flattened_dynamic_and_json_serialization selects the FLATTENED Dynamic/JSON layout. Because these affect the block body rather than the packet envelope, they are specified in the Native Format spec — see column wire layout and versioned types.

Glossary

Cancel — a client-initiated packet (type 3) that aborts a running query. Not specified in detail on this page.

End-of-client-data marker — an empty Data packet (0 columns, 0 rows) the client sends to close an input stream. Its position differs by query kind:

  • Normal query (SELECT, etc.): sent after the Query packet and any external-table Data packets to signal "no more external data". The server then begins executing.
  • INSERT: the client does not send a pre-schema marker. The server sends the schema block first, the client streams its row Data blocks, and only then sends the empty Data packet to terminate the row stream. Sending an empty marker before the schema block would be read as an immediate end-of-rows and lose the data.

Feature — a wire-format change introduced in a specific protocol version. Active when the negotiated version is at or above the feature's version. See versioning and feature gates.

Inter-server — a role label for a field that is only meaningful in server-to-server distributed queries. External clients write a default value (usually empty string, 0, or false).

Negotiated versionmin(client_version, server_version), computed during the handshake. Determines which features are active for the lifetime of the connection.

Packet — a wire message: a VarUInt packet type code followed by a body whose format depends on the type. See packet envelope.

Packet type code — the leading VarUInt of a packet that identifies its format. Values 0–18 are currently assigned. See the packet type reference.

Response stream — the sequence of packets the server emits during a query. Open-ended in length, terminated by exactly one EndOfStream (success) or Exception (failure). See the query phase.

Schema block — the header block (a Block with columns but 0 rows) that the server sends during the INSERT phase to announce the expected column shapes before the client sends data.

Settings list — a sequence of (key, flags, value) tuples in the Query body, terminated by an empty key. Carries per-query application-layer configuration. See Setting.

Stage — a VarUInt field in the Query packet (field 5) controlling how far the server executes the query. External clients typically send 2 (Complete); distributed queries and serialized query plans use the higher values. See Query field 5 for the full set of wire values.

Terminator — a packet that ends a stream. The Query response ends on EndOfStream (success) or Exception (failure). The client's input stream ends on the empty Data marker.