Native Format
The Native format is the columnar wire format ClickHouse uses to move tabular data. It shows up in several places:
- the body of
Data,Totals,Extremes,Log, andProfileEventspackets in the native TCP protocol (theTableColumnspacket is not a Native block — it carries two binary strings, so its layout belongs to the native protocol spec); - the output of
SELECT ... FORMAT Nativeover HTTP; - file exports written with
INTO OUTFILE ... FORMAT Native; - inter-server replication payloads.
This page describes the bytes inside a Block — the columnar payload — and the per-column type encodings that build it. Packet framing, connection state, and version negotiation belong to the native protocol specification.
All multi-byte integer fields are little-endian. Signed integers use two's complement.
For a user-facing introduction to the Native format (with curl examples), see the Native format page. This specification is the lower-level wire reference.
Overview
Everything that carries rows on the wire is a Block: a self-describing chunk of rows stored column by column. All values of column 1 come first, then all of column 2, and so on. A Block carries only the columns the query references, never the full table.
A column's data is laid out according to the family its type belongs to. The families, in increasing decoder complexity, are:
- Fixed-width types lay
dataout asbytes_per_value × num_rowsraw bytes, with no per-row framing. - Composite types (
Nullable,Array,Tuple,Map,Nested) have a recursive shape fully derivable from the type string, with no version prefix and no cross-block state. - Versioned / stateful types (
LowCardinality,JSON,Variant,Dynamic) begin each non-empty block with a serialization-version/state prefix. Over theNativewire this prefix and any dictionary are per block — the format carries no state across blocks (the writer creates fresh serialization state for every block and setslow_cardinality_max_dictionary_size = 0). Cross-block state is a MergeTree on-disk concern, not the Native wire layout.
Wire primitives
The Native format builds on four primitive encodings.
| Primitive | Size | Description |
|---|---|---|
| VarUInt | 1–10 B | LEB-128 variable-length unsigned integer |
| Fixed-width int | 1, 2, 4, 8, 16, 32 B | Little-endian, two's complement for signed |
| String | variable | VarUInt length prefix + raw bytes |
| Bool | 1 B | 0x00 = false, non-zero = true |
VarUInt
A variable-length unsigned integer using LEB-128 encoding. Each byte carries 7 data bits in positions 0–6 and 1 continuation bit in position 7. The continuation bit is 1 when more bytes follow and 0 on the final byte.
| Value range | Bytes |
|---|---|
| 0 – 127 | 1 |
| 128 – 16383 | 2 |
| 16384 – 2097151 | 3 |
| up to full UInt64 | up to 10 |
Encoding the value 300:
Decoding the bytes 0xAC 0x02:
Fixed-width integers
| Type | Bytes | Encoding |
|---|---|---|
| UInt8 | 1 | Raw byte |
| UInt16 | 2 | Little-endian |
| UInt32 | 4 | Little-endian |
| UInt64 | 8 | Little-endian |
| UInt128 | 16 | Little-endian |
| UInt256 | 32 | Little-endian |
| Int8 | 1 | Raw byte, two's complement |
| Int16 | 2 | Little-endian, two's complement |
| Int32 | 4 | Little-endian, two's complement |
| Int64 | 8 | Little-endian, two's complement |
| Int128 | 16 | Little-endian, two's complement |
| Int256 | 32 | Little-endian, two's complement |
| Float32 | 4 | IEEE 754 single-precision, little-endian |
| Float64 | 8 | IEEE 754 double-precision, little-endian |
For example, the UInt32 value 1 encodes as 01 00 00 00, and the Int32 value -1 as FF FF FF FF.
String
A length-prefixed byte sequence:
The byte sequence need not be valid UTF-8. An empty string encodes as a single 0x00 byte, and strings may contain any byte values, including embedded NUL. The string "ab" encodes as 02 61 62; to decode, read the VarUInt length (2), then read that many bytes.
Bool
A single byte. 0x00 is false; any non-zero value is true (canonically 0x01).
Block and column structure
Block wire layout
Whether the BlockInfo prefix is present depends on the channel, because the writer is parameterized by a revision:
-
On the native TCP protocol, the server writes blocks at the connection's negotiated revision (a large value —
DBMS_TCP_PROTOCOL_VERSIONis54485in this release).BlockInfois written whenever that revision is greater than zero, which is always the case for a real connection. Thehas_custom_serializationbyte in each column (see column wire layout) is written at revision54454and above. -
The
Nativeoutput format —SELECT ... FORMAT Nativeover HTTP,INTO OUTFILE ... FORMAT Native, and theNativeformat produced byclickhouse-client— serializes at revision0by default. At revision0theBlockInfoprefix and thehas_custom_serializationbyte are both omitted, so a block is justnum_columns,num_rows, and the columns.Over HTTP this revision is not fixed: a client may raise it with the
?client_protocol_version=<n>query parameter, and the server uses that value as the serialization revision for the response.With a high enough value the HTTP output includes the
BlockInfoprefix (written whenever the revision is greater than0) and thehas_custom_serializationbyte (written at revision54454and above), exactly as on the TCP path. Clients must therefore not assume that every HTTPFORMAT Nativepayload is revision0.
In other words, the byte examples in this section that begin with a BlockInfo prefix describe the TCP Data-packet payload. The same query taken through FORMAT Native produces the shorter form shown alongside them.
BlockInfo
BlockInfo is a sequence of fields, each preceded by a VarUInt field ID, terminated by a field ID of 0. The wire format is not self-describing: a field ID does not encode the length or type of its value, so a reader must already know the type of every field ID it might encounter. ClickHouse's own reader treats an unrecognized field ID as corruption and raises an exception (UNKNOWN_BLOCK_INFO_FIELD). Forward compatibility is handled instead by the protocol revision: the sender only writes a field if the negotiated revision is at least that field's minimum revision, so an older receiver never sees a field it does not know.
| Field ID | Field | Type | Min revision | Description |
|---|---|---|---|---|
| 1 | is_overflows | UInt8 | 0 | Overflow block from GROUP BY. 0 for non-overflow blocks. |
| 2 | bucket_number | Int32 | 0 | Aggregation bucket. -1 for non-bucketed blocks. |
| 3 | out_of_order_buckets | List of Int32 | 54480 | Buckets delayed during distributed aggregation. Encoded as a VarUInt count followed by that many Int32 values. |
| 0 | (terminator) | — | — | End of BlockInfo. Always required. |
Fields 1 and 2 have minimum revision 0, so they are present whenever a BlockInfo is written at all. Field 3 is written only at revision 54480 and above. Wire layout for the common case (revision below 54480):
Column wire layout
A Column appears num_columns times within a Block.
| # | Field | Type | Condition | Description |
|---|---|---|---|---|
| 1 | name | String | always | Column name |
| 2 | type | String or binary type encoding | always | ClickHouse type string (e.g., "UInt64", "Array(String)") by default; a binary type encoding when output_format_native_encode_types_in_binary_format = 1 (see note below) |
| 3 | has_custom_serialization | UInt8 | feature CUSTOM_SERIALIZATION (v54454) | 0 = default, 1 = custom (kind_stack follows) |
| 4 | kind_stack | bytes | when field 3 = 1 | One UInt8 enum byte (see below) describing the non-default serialization (sparse, etc.). For the COMBINATION value, followed by a VarUInt count plus that many additional kind bytes. For a Tuple (and other composites with element-level serialization info) the payload is recursive — see below. |
| 5 | data | bytes | always | Column values for all num_rows rows. Layout per type — see data types. For sparse columns, see below. |
A decoder dispatches on the type string. Type strings often carry parameters in parentheses; the decoder strips the (...) suffix to find the base type and then parses the parameters for size, scale, or inner-type decisions. Parsing a parameter list with nested types (a Tuple inside an Array, say) needs a depth-aware comma splitter that tracks parenthesis nesting rather than a naive split on ,.
The type field is a textual String only in the default mode. When the query setting output_format_native_encode_types_in_binary_format = 1 is set, this field is instead a binary type encoding — the same tag-based encoding documented in data type binary encoding — and flattened Dynamic type lists use the same binary encoding for their per-type names. A decoder that always reads field 2 as a length-prefixed string would treat the first binary type tag as a string length and desynchronize, so it must know which mode the stream uses.
kind_stack and sparse encoding
The kind_stack byte enumerates a non-default per-column serialization:
| Byte | Name | Meaning | Wire impact on data |
|---|---|---|---|
0x00 | DEFAULT | Default serialization | Identical to has_custom = 0 |
0x01 | SPARSE | Sparse serialization (v54465+) | Offset stream + non-default values; see below |
0x02 | DETACHED | Column wrapped in a ColumnBLOB by parallel block marshalling (v54478+) | Pre-marshalled blob: VarUInt size + that many bytes; see below |
0x03 | DETACHED_OVER_SPARSE | A sparse column wrapped in a ColumnBLOB | Same blob payload as DETACHED; see below |
0x04 | REPLICATED | Dictionary form for repeated values (v54482+) | Index stream + dense element values; see below |
0x05 | COMBINATION | Multi-kind stack | Followed by VarUInt count and count further kind bytes — see note below |
COMBINATION payload uses a different enum. The five rows above are compact one-byte codes. COMBINATION (0x05) is the general escape for any stack not covered by them: it is followed by a VarUInt count and then count one-byte entries. Those entries are not the compact codes from the table — they are the raw ISerialization::Kind values:
| Byte | Nested Kind |
|---|---|
0x00 | DEFAULT |
0x01 | SPARSE |
0x02 | DETACHED |
0x03 | REPLICATED |
The byte values differ from the compact codes: REPLICATED is 0x03 in this nested enum but 0x04 as a compact code, and there is no DETACHED_OVER_SPARSE entry — that combination appears as the two consecutive entries SPARSE, DETACHED. A decoder that keeps using the compact table for the nested bytes will mis-map 0x03/0x04 and desynchronize.
The count is the full stack length including the leading DEFAULT entry that begins every stack. The compact codes already cover every one- and two-entry stack, so a COMBINATION always has a count of at least three.
Recursive kind_stack for Tuple columns. The kind_stack payload above is the byte (or COMBINATION sequence) for one column's own serialization info. A Tuple carries a SerializationInfoTuple, which first writes the tuple's own kind-stack payload and then writes one full kind-stack payload for each element, in order; a decoder reads back the same recursive structure. So for Tuple(A, B, C) the field-4 bytes are [tuple_kind][A_kind][B_kind][C_kind], and each element payload is itself recursive if that element is again a composite. The has_custom_serialization byte (field 3) is set whenever the tuple's own info or any element's info is non-default, so a Tuple whose only special element is sparse, replicated, or detached still triggers the kind-stack payload. A decoder that reads only the single leading enum byte for a Tuple will stop too early and misread the remaining element-kind bytes as column data.
Sparse wire format. When kind_stack = 0x01, the column data is two streams written back-to-back in the single shared TCP stream:
- Offset stream — a sequence of
VarUInts. Each valuevis either:vwith the high bit at position 62 clear:(v & 0x3FFFFFFFFFFFFFFF)= the number of default positions before the next explicit non-default value. That non-default position iscursor + group_size, wherecursoris the running position; afterwardscursoradvances bygroup_size + 1.vwith bit 62 set (END_OF_GRANULE_FLAG): the value with the flag cleared = the number of trailing default positions after the last non-default. This marks the end of the offset stream for the block.
- Values stream —
countnon-default values densely encoded in the inner type, wherecountis the number of non-EOG VarUInts read above.
A decoder reconstructs a dense column of num_rows entries by filling every non-explicit position with the inner type's default value (0 for integers and floats, "" for String, 0 days for Date, and so on).
A sparse Nullable(T) column is a special case, because the default value of Nullable(T) is NULL. The sparse encoding drops the usual Nullable null-map stream entirely: the offset stream identifies the non-default — that is, non-NULL — positions, the values stream holds only those non-NULL values densely in T, and every non-explicit position reconstructs as NULL. A decoder must therefore not look for a null map in the values stream, and must not fill the gaps with a present 0; it fills them with NULL.
Replicated wire format. When kind_stack = 0x04, the column data is a dictionary: a list of distinct element values plus a per-row index into that list (the same lookup shape as LowCardinality). When the inner type is itself versioned — for example LowCardinality(T) — its state prefix is written first, ahead of the index stream: the replicated serialization delegates the prefix phase to the inner type before writing num_rows. Inners with an empty prefix (the leaf types and the plain composites) contribute no bytes here.
A decoder reconstructs a dense column by selecting elements[indexes[i]] for each output row i. Composite inner types recurse: the element list is materialized in the inner type, then indexed. Supported inner types include the leaf types, Nullable(T), Array(T), Tuple(...), Map(K, V), Nested(...) (each field expanded like an Array), and LowCardinality(T) (the shared dictionary is kept; only the per-element keys are indexed).
Detached wire format. DETACHED (0x02) and DETACHED_OVER_SPARSE (0x03) do appear over the wire — they are not purely internal. On the TCP path, when compression is enabled and the negotiated revision is at least DBMS_MIN_REVISON_WITH_PARALLEL_BLOCK_MARSHALLING (v54478), the column goes through three steps:
- Each eligible column (non-
const, non-Tuple, in a block with more than one row) is wrapped in aColumnBLOBthat holds the column already marshalled and compressed off the main thread. DETACHEDis appended to the wrapped column's kind stack.- The column
datais written as aVarUIntblob size followed by exactly that many blob bytes.
If the wrapped column was sparse, its stack is {DEFAULT, SPARSE, DETACHED}, which serializes as DETACHED_OVER_SPARSE. A client decoding such a column reads the blob length and bytes, then decompresses the blob to recover the inner column payload (see the ColumnBLOB note under compression).
Block variants
All Data-family packets share the same Block wire format. The variants differ only in their column and row counts:
| Variant | num_columns | num_rows | Purpose |
|---|---|---|---|
| Header block | N > 0 | 0 | Announces the result schema (column names + types). |
| Result block | N > 0 | M > 0 | Actual result rows. |
| Empty block | 0 | 0 | Sentinel — end-of-input on the client side; boundary marker on the server side. |
Byte-level examples
All examples in this section are taken from the TCP Data-packet path, so they include the BlockInfo prefix and the has_custom_serialization byte. Over FORMAT Native the same blocks are shorter — the equivalent short form is given where it helps.
An empty block (with BlockInfo), 8 bytes total:
A header block for SELECT 1 announces one column named "1" of type UInt8, with zero rows. At protocol ≥ 54454 the has_custom_serialization byte is included:
The result block for the same query, with one row:
Through FORMAT Native (revision 0), the same result block has no BlockInfo and no has_custom_serialization byte — SELECT 1 FORMAT Native is 11 bytes:
(A zero-row result, such as a header-only block, produces no bytes at all over FORMAT Native: the output format does not emit empty blocks.)
Data types
This section documents the wire encoding of the types the Native format can carry within a column's data, grouped into four families of increasing decoder complexity. Two types — AggregateFunction(func, ...) and QBit(T, N) — are valid Native column types but have function- or type-specific payloads that are out of scope here; they are called out below where they would otherwise be mistaken for aliases.
| Family | Section | Streams per column | Cross-block state |
|---|---|---|---|
| Fixed-width | Fixed-width types | One | None |
| Variable-length | Variable-length types | One | None |
| Composite (fixed shape) | Composite types | Multiple | None |
| Versioned / stateful | Versioned types | Multiple | None on the Native wire — per-block state prefix, fresh per block |
Fixed-width types
Each value occupies a constant number of bytes. A column of M rows occupies exactly bytes_per_row × M bytes on the wire, concatenated with no separators or padding.
| Type string | Bytes per value | Logical value | Wire encoding |
|---|---|---|---|
UInt8 | 1 | Unsigned 8-bit integer | Raw byte |
UInt16 | 2 | Unsigned 16-bit integer | Little-endian |
UInt32 | 4 | Unsigned 32-bit integer | Little-endian |
UInt64 | 8 | Unsigned 64-bit integer | Little-endian |
UInt128 | 16 | Unsigned 128-bit integer | Little-endian |
UInt256 | 32 | Unsigned 256-bit integer | Little-endian |
Int8 | 1 | Signed 8-bit, two's complement | Raw byte |
Int16 | 2 | Signed 16-bit, two's complement | Little-endian |
Int32 | 4 | Signed 32-bit, two's complement | Little-endian |
Int64 | 8 | Signed 64-bit, two's complement | Little-endian |
Int128 | 16 | Signed 128-bit, two's complement | Little-endian |
Int256 | 32 | Signed 256-bit, two's complement | Little-endian |
Float32 | 4 | IEEE 754 single-precision | Little-endian |
Float64 | 8 | IEEE 754 double-precision | Little-endian |
BFloat16 | 2 | High 16 bits of an IEEE 754 Float32 | Little-endian |
Bool | 1 | 0x00 = false, 0x01 = true | Raw byte |
Date | 2 | Days since 1970-01-01 | Little-endian UInt16 |
Date32 | 4 | Days since 1970-01-01 (signed; pre-1970 ok) | Little-endian Int32 |
DateTime | 4 | Unix timestamp in seconds | Little-endian UInt32 |
DateTime(tz) | 4 | Same as DateTime; timezone is metadata | Little-endian UInt32 |
DateTime64(s) | 8 | Ticks at scale s (10^-s seconds since epoch) | Little-endian Int64 |
DateTime64(s, tz) | 8 | Same as DateTime64(s); timezone is metadata | Little-endian Int64 |
Time | 4 | Signed clock duration in seconds | Little-endian Int32 |
Time64(s) | 8 | Signed clock duration in ticks at scale s | Little-endian Int64 |
Interval<Unit> | 8 | Signed count; the unit lives in the type string | Little-endian Int64 |
UUID | 16 | 128-bit identifier | Two byte-swapped LE UInt64 halves (see UUID) |
IPv4 | 4 | IPv4 address | Little-endian UInt32 |
IPv6 | 16 | IPv6 address | Network byte order, no swap |
Enum8 | 1 | Signed 8-bit (variant index) | Raw byte |
Enum16 | 2 | Signed 16-bit (variant index) | Little-endian |
Decimal(P, S) | 4 / 8 / 16 / 32 | value × 10^S as a signed integer; width depends on P (≤9 → 4 B, ≤18 → 8 B, ≤38 → 16 B, ≤76 → 32 B) | Little-endian signed integer |
Integer types
UInt8–UInt256 and Int8–Int256 are direct binary encodings of integer values. A decoder reads bytes_per_row × num_rows bytes and interprets them according to the type.
A UInt32 column holding [1, 256, 65536]:
An Int32 column holding [-1, 42]:
Float32 and Float64
Standard IEEE 754 binary floats: 4 bytes single-precision (binary32) and 8 bytes double-precision (binary64), each little-endian. NaN, ±Infinity, ±0.0, and subnormals all round-trip without normalization.
Float32 value 1.5 (0x3FC00000):
Float64 value 1.5 (0x3FF8000000000000):
BFloat16
The brain-floating-point format: the high 16 bits of an IEEE 754 Float32 — 1 sign bit, 8 exponent bits, 7 mantissa bits. Each value is 2 bytes, little-endian, holding the raw 16-bit pattern. To recover the numeric value, widen it back to Float32 by placing the pattern in the high half and zeroing the low half (bits << 16 reinterpreted as Float32); the widened value then shares Float32's text formatting.
BFloat16 value 1.5 (pattern 0x3FC0, the top half of Float32 0x3FC00000):
Bool
Wire-compatible with UInt8: 1 byte per row, 0x00 = false, 0x01 = true. The type string on the wire is literally Bool (not UInt8), so a decoder dispatching on the type string must recognize it separately.
A Bool column [true, false, true]:
Date and Date32
Both encode dates as integer day counts relative to the Unix epoch 1970-01-01. Neither carries a time component.
| Type | Bytes | Encoding | Range |
|---|---|---|---|
Date | 2 | Little-endian UInt16 | 1970-01-01 to 2149-06-06 |
Date32 | 4 | Little-endian Int32 | wide signed range, pre-1970 ok |
Date value 1970-01-02 (1 day):
Date32 value 1900-01-01 (-25567 days):
DateTime
Wire-compatible with UInt32: a Unix timestamp in seconds, 4 bytes little-endian. The type may appear as DateTime or DateTime('Timezone'); the timezone affects display only and is not part of the wire value. Two DateTime columns with different timezone parameters produce identical bytes for the same instant. A decoder strips the (...) parameter suffix and processes the column as UInt32.
DateTime('UTC') value 2024-03-15 14:30:00 UTC (timestamp 1710513000):
DateTime64(scale[, timezone])
8 bytes, little-endian Int64 representing ticks at scale 10^-scale seconds since the Unix epoch. The scale parameter (0–9) lives in the type string and sets the time unit:
| Scale | Tick size | Common name |
|---|---|---|
| 0 | 1 second | seconds |
| 3 | 1 millisecond | ms |
| 6 | 1 microsecond | µs |
| 9 | 1 nanosecond | ns |
The type appears as DateTime64(s) (implicit server-default timezone) or DateTime64(s, 'TimezoneName') (explicit timezone, display only). Negative values represent ticks before the epoch.
DateTime64(3, 'UTC') value 2024-01-15 12:30:45.123 UTC (1705321845123 ms):
DateTime64(0) value 2024-01-15 12:30:45 UTC (1705321845 s):
Time and Time64(scale)
A clock duration rather than a point in time. Time is a signed second count, 4 bytes little-endian Int32; Time64(scale) is a signed tick count at the given decimal scale (0–9), 8 bytes little-endian Int64 — the same wire shape as DateTime64.
The textual form is [-]HH:MM:SS[.fraction], but unlike DateTime the hour field is not wrapped to a 24-hour day: it is the total hour count and may exceed 23. The displayed magnitude is capped at 999:59:59 (3599999 seconds); a larger magnitude renders at the cap with a zeroed fraction (999:59:59.000). CAST clamps the stored value to this range as well, though arithmetic can produce out-of-range values that are clamped only on display. None of this affects the wire bytes, which are the plain signed integer.
Time value 45296 (12:34:56):
Time64(3) value 45296789 ticks (12:34:56.789):
Time and Time64 are experimental and require allow_experimental_time_time64_type = 1 on the server.
Interval
Interval<Unit> — IntervalSecond, IntervalMinute, IntervalHour, IntervalDay, IntervalWeek, IntervalMonth, IntervalQuarter, IntervalYear, IntervalNanosecond, and so on. Every unit shares one wire encoding: the count as a signed 8-byte little-endian Int64. The unit lives only in the type string — it changes neither the wire bytes nor the textual form, which is the bare integer. A single decoder path handles every unit.
IntervalDay value 5:
UUID
16 bytes per value. The wire encoding is not the canonical 16 big-endian bytes — each 8-byte half is byte-reversed independently.
The logical model is a 128-bit identifier in canonical text form xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, where the bytes are conventionally written big-endian. The wire model takes those 16 canonical bytes, splits them into two 8-byte halves, and writes each half little-endian:
- Wire bytes 0..7 = canonical bytes 0..7 reversed.
- Wire bytes 8..15 = canonical bytes 8..15 reversed.
UUID 550e8400-e29b-41d4-a716-446655440000:
The nil UUID (all zeros) appears identically in both representations.
IPv4 and IPv6
Two related but differently-encoded address types.
IPv4 is 4 bytes, encoded as a little-endian UInt32 holding the canonical 32-bit address (the value (a << 24) | (b << 16) | (c << 8) | d from a.b.c.d). The wire bytes are the network-order bytes reversed.
192.168.1.10 (canonical 32-bit value 0xC0A8010A):
IPv6 is 16 bytes, written verbatim in network byte order with no swap — the same byte order as inet_pton(AF_INET6, ...).
2001:db8::1:
The asymmetry is deliberate: IPv4 is stored as a u32 for arithmetic and compact range queries, while IPv6 keeps the network-order layout common to most networking APIs.
Enum8 and Enum16
Wire-compatible with Int8 and Int16 respectively: 1 or 2 bytes per row, two's complement little-endian for the 16-bit variant. The full variant mapping lives in the type string:
A decoder may strip the (...) parameter suffix and dispatch as Int8 / Int16 — the wire bytes are just the integer index. A client that surfaces the label parses the 'name' = value map out of the type string and keeps it alongside the column: the integer alone does not recover the label. Text-oriented output renders the label (active) rather than the index, single-quoted ('active') when the enum is nested inside a composite. Because the map is not recoverable from the integer column, it must be retained for nested enums such as Array(Enum8(...)) or Map(Enum16(...), V).
An Enum8('active' = 1, 'inactive' = 2) column [active, inactive, active]:
An Enum16(...) value 30000:
Decimal(P, S)
A signed integer scaled by a power of 10. The integer's byte width is implied by the precision P; the scale S is the negative exponent (the number of digits after the decimal point). Both live in the type string.
| Precision (P) | Backing integer | Bytes |
|---|---|---|
| 1 ≤ P ≤ 9 | Int32 | 4 |
| 10 ≤ P ≤ 18 | Int64 | 8 |
| 19 ≤ P ≤ 38 | Int128 | 16 |
| 39 ≤ P ≤ 76 | Int256 | 32 |
The wire encoding is the backing integer in little-endian two's complement, and the logical decimal value is wire_integer × 10^(-S).
ClickHouse always emits Decimal(P, S) regardless of how the type was declared. Decimal32(S), Decimal64(S), and so on all normalize to Decimal(P, S) on the wire (with P set to the natural maximum for the width: 9, 18, 38, 76). A decoder that recognizes only Decimal(P, S) covers every spelling the server emits.
Decimal(9, 4) value 123.4567 → backing integer 1234567:
Decimal(18, 1) value -1.5 → backing integer -15:
Decimal(38, 4) value 123.4567 (16 bytes total):
Nothing
The Nothing type carries no values. In practice it appears only as the inner type of Nullable(Nothing) — what the server returns for an expression like SELECT NULL whose only valid value is the absence of one. It is conceptually a unit type.
On the wire it occupies exactly one placeholder byte per row. The server emits the ASCII character '0' (0x30), but the deserializer ignores the bytes — the content is undefined and decoders must not rely on any specific value. The number of bytes written is num_rows × 1, so the column header's num_rows fully determines how much to consume.
The byte-per-row keeps the Block invariant intact: every column spans a length derivable from num_rows, so decoders scan forward without per-cell length prefixes. The surrounding Nullable always reports every position as NULL, so the placeholders are never inspected.
A Nullable(Nothing) column with 3 rows (all NULL):
The null-map prefix is the standard Nullable framing (see Nullable); the inner three bytes are the Nothing payload, which the decoder skips.
Variable-length types
Each value carries its own length on the wire.
String
Type string: String. A String column is a sequence of num_rows length-prefixed byte sequences:
There are no separators between rows beyond the length prefixes, and no row-level state. An empty string is a single 0x00 byte. ClickHouse String is byte-oriented rather than text-oriented: UTF-8 validity is not enforced, and a value may contain any bytes including embedded NUL. A decoder that targets a UTF-8 string type either validates on read or exposes raw bytes to the caller. The total bytes consumed by the column is Σ (varuint_size(len_i) + len_i) over all rows.
A column of 3 strings ["ab", "", "c"] (6 bytes total):
FixedString(N)
Type string: FixedString(N), where N is a positive integer (for example, FixedString(16)). The column is exactly N × num_rows raw bytes, with no length prefixes and no separators. A decoder parses N from the type string and consumes that many bytes per row.
When the SQL inserts a value shorter than N bytes (for example, CAST('abc' AS FixedString(5))), the server right-pads with NUL bytes (0x00) to the declared length. These padding bytes are part of the stored value and are sent on the wire as-is; trimming is a client-side concern. Like String, FixedString(N) is byte-array-like rather than text-like — typically used for fixed-width identifiers, address bytes, or hash digests.
Two FixedString(3) values ["abc", "de\0"] (6 bytes total):
The two string types compared:
| Property | String | FixedString(N) |
|---|---|---|
| Per-row length prefix | Yes (VarUInt) | No |
| Row size | Variable | Exactly N bytes |
| Total column bytes | Variable | N × num_rows |
| NUL-byte padding | n/a | Right-padded by server |
| UTF-8 expected | Typically (not enforced) | No (treat as raw bytes) |
| Type parameter | None | Required integer N |
Composite types
Composite types wrap one or more inner types and share a common wire model: multiple streams per column. A single logical column is encoded as two or more independently-read byte sequences, concatenated.
They share three structural properties:
- Fixed shape per schema. The structure is determined entirely by the type string at decode time.
Array(UInt32)always has the same stream layout, block to block. - No version prefix of their own. The composite wrapper itself adds no version byte; its framing (offsets, null-map, element streams) is stable across ClickHouse releases. This applies to the wrapper only — see the prefix-phase note below for inner versioned types.
- No cross-block state of their own. The wrapper's framing is fully self-describing per block; any cross-block-state concern comes from an inner versioned type, not the wrapper.
Composites are recursive — an inner type may itself be a composite.
Prefix phase before the data streams. Reading a column is two phases, in this order: a state-prefix phase and then the data-stream phase. A composite wrapper has no prefix bytes of its own, but it delegates the prefix phase to its inner serialization before writing any of its own data streams: SerializationArray runs its inner type's prefix phase before the array offsets are written, and Tuple, Map, Nested, and Nullable do the same through their element serializations (Nullable runs the inner prefix before its null map).
So when a composite wraps a versioned/stateful type (LowCardinality, Variant, Dynamic, JSON), that inner type's version/state prefix is emitted first, ahead of the wrapper's offsets and element payload. For example, Array(LowCardinality(String)) is laid out as [LowCardinality state prefix] → [array offsets] → [flattened LowCardinality element payload], not offsets-first.
A decoder that reads offsets before running the inner prefix phase will desynchronize on any composite containing LowCardinality, Variant, Dynamic, or JSON. When every inner type is a plain leaf or another non-versioned composite, the prefix phase emits no bytes and the offsets-first description below applies verbatim.
Nullable(T)
Type string: Nullable(InnerType). Examples: Nullable(UInt32), Nullable(String), Nullable(FixedString(16)), Nullable(DateTime('UTC')).
Like the other composites, Nullable delegates the prefix phase to its inner serialization before writing the null map: when the inner is versioned, the inner's state prefix is emitted first. So Nullable(Tuple(LowCardinality(String))) starts with the LowCardinality state prefix, not the null map. When the inner is a leaf or another non-versioned type, the prefix phase emits no bytes.
The wire layout is the inner prefix phase (empty unless the inner is versioned) followed by two concatenated streams, null-map first:
The null-map is exactly num_rows bytes, one per row:
| Byte value | Meaning |
|---|---|
0x00 | Value is present at this row. |
non-zero (canonical 0x01) | Value is NULL. The corresponding bytes in the values stream are a placeholder. |
The values stream contains the inner type's standard encoding for all num_rows rows, including the null positions. A decoder must still read the placeholder bytes at null positions to advance the stream, but must consult the null-map before interpreting any individual value. Senders may write any bytes at null positions, so decoders must not rely on a specific placeholder value.
Placeholder values by inner type family:
| Inner type family | Placeholder at null position |
|---|---|
| Fixed-width (UInt/Int/Float/DateTime/UUID/etc.) | Zero-initialized bytes of the type's width |
String | Empty string — single 0x00 byte |
FixedString(N) | N zero bytes |
Array(T) | Empty array — offsets advance by zero |
Tuple(T1, T2, ...) | Each element uses its own placeholder |
Nullable(T) may appear inside Array, Tuple, Map, and Nested — Array(Nullable(T)) and Tuple(Nullable(T1), T2) are common. Nullability does not compose with itself: Nullable(Nullable(T)) is rejected by the server.
A Nullable(UInt8) with three rows [5, NULL, 9] (6 bytes total):
A Nullable(String) with three rows ["hello", NULL, "world"] (15 bytes total):
Array(T)
Type string: Array(InnerType). Examples: Array(UInt32), Array(String), Array(Nullable(UInt32)), Array(Array(UInt8)).
The wire layout is the inner prefix phase (empty unless the inner type is versioned) followed by two concatenated streams, offsets first:
The offsets stream is exactly num_rows little-endian UInt64 values, each the cumulative end position in the values stream after that row's elements:
- Element start index for row
N=offsets[N - 1](or0whenN == 0). - Element end index (exclusive) for row
N=offsets[N]. - Row
N's element count =offsets[N] - offsets[N - 1].
offsets[num_rows - 1] is therefore the total element count across all rows, and the values stream holds that many inner values concatenated end-to-end.
Offsets are monotonic non-decreasing; equal consecutive offsets mean an empty row, and a decoder should reject non-monotonic offsets as corruption. An empty column (num_rows == 0) writes zero bytes — no offsets stream and no values stream. Inner types may be any type, including other composites: Array(Array(T)), Array(Tuple(...)), and Array(Nullable(T)) are all legal.
Array(UInt32) with rows [[10, 20, 30], [], [40, 50]] (44 bytes total):
Each offset is the cumulative end of a row's slice of the shared values stream; the start is the previous offset (or 0 for row 0). Equal consecutive offsets are an empty row:
Array(String) with rows [["a", "bb"], []] (20 bytes total):
Array(Array(UInt32)) with rows [[[1,2]], [], [[3], [4,5]]] nests the same shape:
- Outer offsets:
[1, 1, 3]— row 0 has 1 inner array, row 1 has 0, row 2 has 2. - The middle
Array(UInt32)decodes 3 rows with offsets[2, 3, 5]. - The innermost
UInt32decodes 5 values:[1, 2, 3, 4, 5].
That comes to 24 (outer offsets) + 24 (middle offsets) + 20 (values) = 68 bytes.
Tuple(T1, T2, ...)
Type string: Tuple(T1, T2, ..., Tn). Examples: Tuple(UInt32, String), Tuple(Int32), Tuple(Array(UInt32), String), Tuple(UInt8, Tuple(Int32, String)). ClickHouse also supports named tuples via Tuple(a UInt32, b String); names are metadata only and do not affect the wire format.
The wire layout is the elements' prefix phase (each versioned element contributes its state prefix, in declaration order; empty for non-versioned elements) followed by N concatenated streams, one per element type, in declaration order:
Each stream encodes exactly num_rows values. There is no length prefix, no offsets stream, and no separators between streams. An empty column (num_rows == 0) writes zero bytes per stream. Element types may be any type, including other composites — Tuple(Tuple(...), ...), Tuple(Array(...), ...), and Tuple(Nullable(T1), T2) are all legal.
The zero-element tuple Tuple() is also legal — it arises from expressions like SELECT tuple() or CAST(x AS Tuple()). Having no element streams, it instead serializes like Nothing: one placeholder byte (0x30, ASCII '0') per row, which the deserializer discards. The row count comes from the block header, exactly as for Nothing.
Tuple(UInt8, UInt8) with 3 rows (1,4), (2,5), (3,6):
The layout is not row-major: reading the raw bytes back yields [1, 2, 3] for element 0 and [4, 5, 6] for element 1.
Tuple(UInt32, String) with 2 rows (10, "a"), (20, "bb") (13 bytes total):
Map(K, V)
Type string: Map(KeyType, ValueType). Examples: Map(String, UInt32), Map(String, Array(UInt32)), Map(UInt8, Tuple(Int32, String)), Map(Array(String), Int8). The wire format places no restriction on either type — both K and V may be any supported type, including composites. (ClickHouse's SQL-level rules around accepted key types have varied across releases; consult the SQL documentation for the targeted server version.)
The wire layout is byte-identical to Array(Tuple(K, V)), so it begins with the inner prefix phase (empty unless K or V is versioned):
where total_pairs = offsets[num_rows - 1] (or 0 when num_rows == 0). The offsets stream has the same semantics as Array. Keys are positionally aligned with values: pair i is (keys[i], values[i]).
ClickHouse's in-memory representation of a Map column is an array of tuples; the type system surfaces it as a distinct type for SQL ergonomics (m['key'], mapKeys, mapValues). The wire format is a direct serialization of that storage, so Map and Array(Tuple(K, V)) are byte-for-byte interchangeable.
Offsets are monotonic non-decreasing, and both the keys and values streams contain exactly total_pairs values. An empty column writes zero bytes. Within a single row keys are typically unique, but this is a semantic rule, not a wire-enforced one: the wire format lets duplicate keys round-trip, and server-side semantics resolve duplicates only when a Map-aware function consumes the row.
Map(UInt8, UInt8) with 2 rows {1:10, 2:20}, {3:30} (22 bytes total):
Keys and values are stored as separate streams, not interleaved — pair i is reconstructed by reading keys[i] and values[i] together.
Map(String, UInt32) with 1 row {'a':1, 'b':2} (20 bytes total):
Nested(name1 T1, name2 T2, ...)
The on-wire representation of Nested depends on the server-side flatten_nested setting, which gives two distinct cases.
Case A: flatten_nested = 1 (server default). When the table was created under default settings, Nested is not a wire type. The server stores and presents the column as N parallel Array(T_i) columns with dotted names (outer.field1, outer.field2, and so on). For the format layer there is nothing new — every dotted column is a regular Array:
Case B: flatten_nested = 0. When the table was created with flatten_nested = 0, the column appears on the wire as a single column with type string Nested(name1 T1, name2 T2, ...), and its layout after the type string is byte-identical to Array(Tuple(T1, T2, ..., Tn)) — including the inner prefix phase, so any versioned field T_i emits its state prefix first, ahead of the offsets. The example below uses non-versioned fields, so the prefix phase is empty:
The only difference is the type-string text: Nested preserves the field names (a, b), which Array(Tuple) does not carry as named slots.
The Case B type string is a comma-separated list of (name, type) pairs. The first whitespace separates a name from its type; the type itself may contain further whitespace, commas, and parens, so parsing needs the same depth-aware splitter used for Tuple. The wire layout:
where total_elements = offsets[num_rows - 1] (or 0 when num_rows == 0). Offsets are monotonic non-decreasing, and every field stream holds exactly total_elements values. The server enforces at INSERT time that, within a single row, all fields carry the same number of elements. An empty column writes zero bytes.
Nested(a UInt8, b String) with 2 rows [(10,'x'),(20,'y')] and [(30,'z')] (25 bytes after the type string):
Type aliases
Several types are pure aliases: the server sends the alias name in the column header, but the bytes that follow are those of an underlying type. A decoder maps the alias to that type and reuses its codec — no new wire format is involved.
The geographic types alias to nested arrays and tuples:
| Type string | Underlying wire type |
|---|---|
Point | Tuple(Float64, Float64) |
Ring, LineString | Array(Point) |
Polygon, MultiLineString | Array(Ring) |
MultiPolygon | Array(Polygon) |
So a Point column is decoded exactly as Tuple(Float64, Float64) (rendering as (1,2)), a Ring as Array(Tuple(Float64, Float64)) ([(0,0),(1,1)]), and so on up the hierarchy.
Geometry is also an alias, but to a Variant rather than a nested array: its payload is the variant of the six geo types above. The column header carries just the type string Geometry — it does not spell out the variant — so a decoder must expand it itself. As with any Variant, the discriminators follow the canonical name-sorted order of the geo aliases: 0 = LineString, 1 = MultiLineString, 2 = MultiPolygon, 3 = Point, 4 = Polygon, 5 = Ring. Each selected value is then decoded through its geo alias above (NULL uses the Variant NULL discriminator 255).
SimpleAggregateFunction(func, T) is an alias for its value type T. It stores an already-finalized aggregate value, so its wire form and rendering are exactly those of T (SimpleAggregateFunction(sum, UInt64) is decoded as UInt64). Only the single-value-type form is an alias this way; the underlying type may itself be a composite.
Two related types are not aliases. They are valid Native column types — a client can receive an AggregateFunction column from a -State combinator or distributed aggregation, for instance — but each carries its own specialized payload that is outside the scope of this page:
AggregateFunction(func, ...)holds an intermediate aggregation state (not a finalized value); its binary layout is specific to the aggregate function and version.QBit(T, N)stores a vector with its bit planes transposed for vector-search workloads.
Versioned types
Versioned types carry an on-wire serialization-version prefix that declares which variant of the encoding follows. They may also use multiple streams (like the composites). On the Native wire the prefix and any dictionary are per block — these types maintain no cross-block state (see the per-block prefix note below); cross-block serialization state exists only in the MergeTree on-disk stream.
These types are considerably more involved than the fixed-shape composites, and a client targeting simple analytical queries can defer them.
Serialization version: concept
A serialization version is a per-type, per-column on-wire version number that declares which variant of a type's encoding the sender is using. It is the first thing in the column's state prefix, so the decoder reads it and dispatches to the right parser for the rest of the column.
It is distinct from the protocol version:
| Dimension | Protocol version | Serialization version (this section) |
|---|---|---|
| Scope | Connection-wide | Per-type, per-column |
| Negotiated | Yes, at handshake | No — sender writes, receiver reads |
| Controls | Which packet-level features are active | Which wire variant of one type |
| Mandatory to read | Yes | Yes, for each versioned column |
Most versioned types write the version as a little-endian UInt64 immediately before any other state-prefix data; a few use VarUInt or UInt8. A decoder reads the version first and rejects unknown values — a higher version implies a newer sender format the decoder does not understand, and mis-parsing it corrupts every subsequent byte.
The state prefix is emitted at the start of every block whose row count is greater than zero, immediately before that block's payload.
The Native writer and reader do not keep serialization state across blocks: NativeWriter creates a fresh serialize state and writes a state prefix for each non-empty column block it writes, and NativeReader creates a fresh deserialize state and reads it for each non-empty block it reads (both skip the prefix entirely when rows == 0).
Header blocks (rows = 0) and empty blocks therefore emit nothing, and a decoder must read the state prefix again at the start of each non-empty block. A decoder that reads the prefix only once and treats later blocks as payload-only will read the next block's prefix as data and desynchronize:
Serialization version reference
| Type | Field width | Value | Name | Meaning |
|---|---|---|---|---|
| Object (base for JSON) | UInt64 LE | 0 | V1 | Original encoding. Includes max_dynamic_paths parameter and a list of dynamic paths. |
1 | STRING | Native-format compatibility mode — Object transmitted as a single String column containing JSON text. | ||
2 | V2 | V1 layout minus the max_dynamic_paths parameter. | ||
3 | FLATTENED | Native-format compatibility mode — flattened path representation. | ||
4 | V3 | V2 plus a shared-data serialization version sub-field and a statistics flag. | ||
Object shared data (sub-stream used in Object V3) | VarUInt | 0 | MAP | Shared data encoded as Map(String, String). |
1 | MAP_WITH_BUCKETS | Same as MAP but split into N buckets for scan efficiency. | ||
2 | ADVANCED | Compact granule format with separate streams for paths / marks / metadata. | ||
| Dynamic | UInt64 LE | 1 | V1 | Original encoding. Includes max_dynamic_types and a list of runtime variant types. |
2 | V2 | V1 minus the max_dynamic_types parameter. | ||
3 | FLATTENED | Native-format compatibility mode. | ||
4 | V3 | V2 plus binary-encoded variant type names and empty-statistics support. | ||
| Variant discriminators mode | UInt64 LE | 0 | BASIC | Every row's discriminator is written literally. |
1 | COMPACT | If all rows in a granule share one discriminator, only a single value + granule marker is written. | ||
Variant granule format (when mode is COMPACT) | UInt8 | 0 | PLAIN | Granule has heterogeneous discriminators. |
1 | COMPACT | Granule has one discriminator for all rows. | ||
| LowCardinality key serialization | Int64 | 1 | sharedDictionariesWithAdditionalKeys | Only version currently defined. |
JSON-as-String fallback (when output_format_native_write_json_as_string is enabled) | UInt64 LE | 1 | JSONStringSerializationVersion | JSON column arrives as a String column preceded by this prefix. |
A few things worth noting about the table:
- The values are not contiguous.
Dynamicuses1,2,3,4withV3at4andFLATTENEDat3. A higher number is not necessarily newer. - Some values are native-format-only.
Object::STRING,Object::FLATTENED, andDynamic::FLATTENEDexist for native-protocol compatibility with clients that do not implement full Object/Dynamic. They do not appear in MergeTree on-disk storage. V3is primarily on-disk. Clients consuming the native TCP protocol typically seeFLATTENED(value3) rather thanV3(value4).
LowCardinality(T)
The simplest versioned type. It replaces a column of N inner values with a small dictionary of unique values plus N indices into that dictionary.
Type string: LowCardinality(InnerType). Examples: LowCardinality(String), LowCardinality(FixedString(4)), LowCardinality(Nullable(String)).
The state prefix (Int64 LE = 1) is the single defined version, sharedDictionariesWithAdditionalKeys; other values are reserved.
The per-block metadata UInt64 is a bitfield:
| Bit range | Meaning |
|---|---|
| 0..7 | Key type code: 0 = UInt8, 1 = UInt16, 2 = UInt32, 3 = UInt64. The smallest type that can index dict_size entries is chosen. |
8 (0x100) | NeedGlobalDictionaryBit — a single dictionary shared across blocks. Never set in the Native format: the Native writer uses low_cardinality_max_dictionary_size = 0, and the Native reader rejects this bit (native_format raises INCORRECT_DATA — "cannot use global dictionary"). It belongs to the MergeTree on-disk stream, not the wire. |
9 (0x200) | HasAdditionalKeysBit — set when the block carries additional dictionary keys (written before the indexes). Always set for a non-empty Native block. |
10 (0x400) | NeedUpdateDictionary — set when the block carries a dictionary update. Always set for a non-empty Native block, since each block ships its own self-contained dictionary. |
For a typical query response with a single data block per column, the metadata is 0x600 (HasAdditionalKeys + NeedUpdateDictionary).
The dict values are dict_size values encoded using the inner type T. The dictionary reserves leading slots for special values: a non-nullable column reserves one (dict[0] holds the inner type's default value, e.g. "" for String), and real distinct values start at dict[1].
For LowCardinality(Nullable(T)) the dict is still encoded as plain T (no null-map stream), but two slots are reserved: dict[0] is the NULL marker and dict[1] is the inner type's default value (e.g. "" for String); real distinct values start at dict[2]. A NULL row's key points at dict[0], and that slot is written on the wire as the inner type's default bytes.
The keys are indices into the dict; each index is 1 << key_type_code bytes (1, 2, 4, or 8), and value N is reconstructed as dict[keys[N]].
keys_count is the number of LowCardinality values at the current recursive level, not necessarily the block's row count. For a top-level LowCardinality column the two coincide. But when the LowCardinality sits under a composite, the count is the flattened value count that the composite passes down: for Array(LowCardinality(String)) with three rows holding five elements in total, keys_count is 5, not 3; for Map(K, LowCardinality(V)) it is the total pair count, and so on. A decoder must take keys_count from this field rather than assuming the block row count. When that flattened count is zero — for example a block whose arrays are all empty — the LowCardinality data phase writes nothing at all: only the state prefix (emitted in the composite prefix phase) is present, with no metadata, dictionary, or keys_count following.
The state prefix is read at the start of every block whose row count is greater than zero — header blocks (rows = 0) and empty blocks emit nothing. Within a block, keys_count equals the row count, dict_size equals the number of values in the dict stream, and each key fits in 1 << key_type_code bytes.
In the Native format each block ships a self-contained, block-local dictionary — there is no cross-block dictionary state. The Native writer sets low_cardinality_max_dictionary_size = 0, so SerializationLowCardinality never builds a shared dictionary: every non-empty block writes its keys as block-local additional keys with NeedGlobalDictionaryBit unset (metadata 0x600), and the Native reader rejects NeedGlobalDictionaryBit when native_format is true. A decoder must therefore reset the dictionary at each block and read the dict_size entries present in that block; carrying a dictionary over from a previous block would misread the next block's keys. (Persisting an LC dictionary across blocks is a MergeTree on-disk concern, not the Native wire layout.)
LowCardinality(String) with values ['a', 'b', 'a', 'c', 'b']:
Reconstructed: dict[1], dict[2], dict[1], dict[3], dict[2] = ["a", "b", "a", "c", "b"].
LowCardinality(Nullable(String)) with values ['a', NULL, '', 'b'] shows both reserved slots — dict[0] for NULL and dict[1] for the empty-string default:
Reconstructed: dict[2] = "a", dict[0] = NULL, dict[1] = "", dict[3] = "b", i.e. ["a", NULL, "", "b"]. Both dict[0] and dict[1] are empty bytes on the wire; the null-ness comes from the key pointing at slot 0, not from the bytes.
JSON (Tier 1: String fallback)
ClickHouse's JSON type has several wire encodings (see the serialization version reference). Tier 1 is the simplest: when the per-query setting output_format_native_write_json_as_string = 1 is set, the server flattens every JSON value to its serialized text and emits the column as a String with a state-prefix marker.
Type string: JSON.
The state prefix value is 1 for this String fallback. The other values denote different JSON/Object encodings: 0 = V1, 2 = V2 (the default over the native TCP protocol), 3 = FLATTENED, 4 = V3 (see the serialization version reference). A decoder that sees a value other than 1 here is not looking at the String fallback. The prefix is read at the start of every block with rows > 0, and the values stream is a standard String column for num_rows rows.
JSON value '{"a":1}' (one row):
The value is emitted as its compact JSON text — {"a":1}, with the integer left as an integer. The text is just a String value, so the client receives the JSON for opaque transit but does not recover the individual paths and their ClickHouse types; faithful per-path typing requires the Tier 2 encoding below.
Variant(T1, T2, ...)
A discriminated union: each row holds a value of exactly one of the variant types, or NULL. Every row carries a one-byte global discriminator selecting its type, and the per-type values are then stored densely, one contiguous run per variant type.
Type string: Variant(T1, T2, ...). The server canonicalizes the order (variant types are sorted by name), so the type string as received already lists the types in global-discriminator order: discriminator 0 selects the first listed type, 1 the second, and so on. 255 (NULL_DISCRIMINATOR) means the row is NULL. Variant elements are never Nullable — NULL is the discriminator's job. Examples: Variant(String, UInt64), Variant(Array(UInt8), String).
The state prefix carries a UInt64 LE discriminators mode: 0 = BASIC (every row's discriminator written literally), 1 = COMPACT (run-length granule encoding). The server uses BASIC over the native protocol by default (use_compact_variant_discriminators_serialization = false); only BASIC is specified here.
To reconstruct, walk the discriminators left to right while keeping a per-type running counter. Row r with discriminator d (≠ 255) takes the value at index counter[d] from variant type d's value run, then counter[d] is incremented. Rows with discriminator 255 are NULL and consume no value from any run, so the sum of the per-type counters equals the number of non-NULL rows.
The state prefix (the mode UInt64) is read at the start of every block with rows > 0; header and empty blocks emit nothing. Each non-NULL discriminator is less than the number of variant types, and variant type i is decoded for exactly count[i] rows.
Variant elements that are themselves stateful (LowCardinality, Variant, Dynamic, JSON) emit their own state prefix in the per-element state-prefix phase, after the mode UInt64. Leaf types and the plain composites (Array, Tuple, Map of leaf types) have empty state prefixes and compose freely.
Variant(String, UInt64) with values [42, 'hi', NULL] (canonical order sorts String before UInt64, so discriminator 0 = String, 1 = UInt64):
Reconstructed: row 0 = UInt64 run[0] = 42; row 1 = String run[0] = "hi"; row 2 = NULL.
The discriminator stream is the index; each non-NULL discriminator pulls the next value from its type's dense run, while 255 (NULL) consumes nothing. This same walk reconstructs Dynamic, which differs only in how NULL is encoded:
Dynamic
A column whose value type is discovered at runtime: each row holds a value of one of a runtime-determined set of types, or NULL. Unlike Variant, the type set is not in the column's type string — it is carried in the state prefix.
Type string: Dynamic or Dynamic(max_types=N). The max_types parameter bounds how many distinct types the column tracks but does not affect the wire format below.
Dynamic has four encodings — V1 = 1, V2 = 2, FLATTENED = 3, V3 = 4. Which one the server emits depends on the channel and on the query settings:
- Over
clickhouse-clientand HTTPFORMAT Nativethe writer's revision is0(unless raised withclient_protocol_version), so the default is V1. - Over the native TCP protocol at its negotiated revision the default is V2. The
Nativewriter leaves statistics disabled, so a defaultV2payload carries no per-variant statistics — after the type list come the nestedVariantprefix and data directly. (Per-variant statistics are a MergeTree on-disk concern, not part of the Native wire.) - The query setting
output_format_native_use_flattened_dynamic_and_json_serialization = 1overrides both and emits FLATTENED (version 3) regardless of revision.
This page specifies only the FLATTENED layout. The non-flat V1/V2/V3 binary layouts are the internal/on-disk representation (binary-encoded type lists, per-variant statistics) and are not specified here. A client that wants to decode Dynamic using this page must request FLATTENED by setting output_format_native_use_flattened_dynamic_and_json_serialization = 1; the layout below assumes that setting. Because the version byte heads the prefix, a decoder can detect the actual encoding it received and reject V1/V2/V3 if it only implements FLATTENED.
The FLATTENED (version 3) layout selected by that setting:
The discriminator width is the smallest unsigned integer that can index num_types types plus the NULL slot — UInt8 for num_types ≤ 255, then UInt16, UInt32, UInt64. NULL is the discriminator value num_types itself, which differs from Variant, where NULL is the fixed value 255. Reconstruction is the same dense walk as Variant: keep a per-type counter, and row r with discriminator d (≠ num_types) takes value counter[d] from type d's run.
The state prefix (version + type list) is read at the start of every block with rows > 0; header and empty blocks emit nothing.
Runtime types whose serialization is stateful (LowCardinality, Variant, Dynamic, JSON) carry nested state prefixes after the type-name list.
The runtime type list normally follows the Variant canonicalization — the regular variant slots are written in DataTypeVariant (type-name) order, so the wire order does not follow insertion order. It is not always globally sorted, however: types that overflowed into the shared variant (for example under Dynamic(max_types=N)) are appended after the regular slots in first-seen order, so the tail of the list can break type-name order. A decoder must therefore treat the transmitted type list as authoritative for discriminator assignment and must not re-sort it itself. For rows [42::UInt64, "hi", NULL] the two types are String and UInt64, and "String" sorts before "UInt64", so the discriminators are 0 = String, 1 = UInt64, 2 = NULL:
Reconstructed: row 0 = UInt64 run[0] = 42; row 1 = String run[0] = "hi"; row 2 = NULL. The per-type runs follow the same wire order as the type list (String before UInt64).
JSON (Tier 2: FLATTENED Object)
The richer JSON encoding: instead of flattening every value to text (Tier 1), the column is split into one sub-column per JSON path. It is selected by not requesting the Tier 1 fallback (output_format_native_write_json_as_string = 0) while the flattened-serialization flag is on (output_format_native_use_flattened_dynamic_and_json_serialization = 1); the server then emits serialization version 3.
There are two kinds of path:
- Typed paths are declared in the type string, for example
JSON(a UInt32, b String), and decoded in their declared type. A path name containing dots is backtick-quoted in the type string. - Dynamic paths are discovered at runtime and each decoded as a Dynamic column.
In FLATTENED mode there is no shared-data column (that overflow store belongs to the non-flat V2/V3 Object encodings). Every path is a full column of num_rows values.
Note the two-phase shape: all path state prefixes come first, then all path data. A dynamic path's Dynamic prefix (in the prefix phase) is therefore separated from its data (in the data phase). The state prefix is read at the start of every block with rows > 0, and every path column (typed or dynamic) holds exactly num_rows values. Row r's object is assembled by reading each path's value at index r; a dynamic path whose Dynamic discriminator is NULL for that row contributes no key.
JSON value {"a": 42, "b": "hi"} (one row, both paths dynamic). A JSON integer is inferred as Int64:
JSON non-flat (V2/V3)
The non-flattened Object encodings (V1/V2/V3) are used by MergeTree on-disk storage and are what the server emits over the wire when the flattened flag is off — V1 over clickhouse-client / HTTP FORMAT Native (revision 0), V2 over the native TCP protocol. They carry a shared-data column and are not specified on this page. Note that they do not carry per-path statistics over the Native wire: NativeWriter leaves statistics disabled, so the Object structure prefix has no statistics section and the bytes after it are the typed/dynamic/shared-data prefixes and data directly. Statistics appear only on the MergeTree on-disk paths that enable them. To decode a JSON column with this page, a client must select one of the documented tiers: set output_format_native_write_json_as_string = 1 for the String fallback, or output_format_native_use_flattened_dynamic_and_json_serialization = 1 (with output_format_native_write_json_as_string = 0) for the FLATTENED Object layout.
Compression frame
ClickHouse can compress the column data of a Native stream with an internal frame format. The frame layout below is transport-independent — the same frames appear on both the native TCP protocol and over HTTP — but how compression is requested, and what surrounds the frames, differs by transport.
- Native TCP protocol. Compression is opt-in per query via the
compressionflag in the Query packet. When active, the body of eachData,Totals,Extremes,Log, andProfileEventspacket — the bytes after thetable_namestring — is wrapped in the frame format. The packet envelope itself, the packet-type code and thetable_namestring, is not compressed; the server writes those to the raw stream. Everything theNativeWriteremits goes into the compressed stream, so theBlockInfoprefix is the first thing inside the frame, along with the dimensions and columns. A client must therefore decompress the frame before it can readBlockInfo. - HTTP.
SELECT ... FORMAT Native&compress=1wraps the wholeFORMAT Nativebyte stream in the same frames (the server uses the same internalCompressedWriteBuffer), and?decompress=1expects the same frames on aNativeinput body, decoding them through the matchingCompressedReadBuffer. There is no TCP packet type,table_name, or packet envelope on this path: the entire compressed payload is just framedNativeblocks (aBlockInfoprefix is present only if the negotiated revision is greater than0, exactly as in the uncompressed layout above). This internalcompress/decompressframing is distinct from HTTP transport compression (Content-Encoding: gzip/zstd, enabled byenable_http_compression), which wraps the response at the HTTP layer and is not the frame format below.
So a client that implemented only the uncompressed FORMAT Native layout must still add this frame layer to read a compressed HTTP Native response or to send a decompress=1 request body.
Frame format
The total framed size is 16 + compressed_size = 16 + 9 + body_size = 25 + body_size. Note the two spans: the checksum covers the 9-byte header plus the body, while compressed_size counts the header plus body but not the checksum itself:
Method byte values
| Byte | Method | Body encoding |
|---|---|---|
0x02 | NONE | Body is the raw bytes (no compression). The frame is still emitted; the receiver verifies the checksum. |
0x82 | LZ4 | Body is the LZ4 block format — not the LZ4 frame format. No magic number. |
0x90 | ZSTD | Body is a raw zstd single-frame stream (the standard zstd magic number is part of the body). |
Checksum
ClickHouse uses CityHash v1.0.2 (the historical variant), not modern Google CityHash; the two produce different outputs.
The checksum is computed over the 9 header bytes (method + compressed_size + uncompressed_size) plus the N body bytes — everything between the checksum and the end of the frame. The first 8 bytes of the 16-byte CityHash128 output are the low half (LE), the next 8 bytes the high half (LE). A decoder recomputes the CityHash128 over the received header and body and compares against the leading 16 bytes; a mismatch is corruption and the decoder fails.
Per-block boundaries
The compressed payload of a Block is a stream of one or more frames, not necessarily a single frame. The sender writes the serialized block through a CompressedWriteBuffer that emits a frame whenever its internal buffer fills (≈1 MB, DBMS_DEFAULT_BUFFER_SIZE) and a final frame when the block is flushed. So a small block is one frame; a large block is several consecutive frames.
The invariant runs one way only: because the sender flushes the compressed buffer at the end of each block, every block end coincides with a frame boundary — but the converse does not hold. An intermediate frame boundary, emitted when the buffer filled mid-block, falls in the middle of a block and is not a block boundary. A decoder must therefore use the block's own dimensions (num_columns/num_rows) to find where a block ends; it must not assume that each frame is one complete block.
A receiver streams the frames: read 16 + 9 bytes, read exactly compressed_size - 9 body bytes, decompress to exactly uncompressed_size bytes, and serve those bytes to the block decoder; when the decoder needs more than the current frame holds, pull the next frame. Because the sender flushes per block, after a block is fully decoded the frame buffer is empty and the next block begins at a fresh frame.
On the native TCP protocol, the packet envelope — the packet-type VarUInt and the table_name string — is written to the raw stream, outside the compressed payload; only the block body (BlockInfo + columns) is framed. The HTTP compress/decompress path has no such envelope: the whole stream is framed blocks.
Negotiation
On the native TCP protocol, compression is per-query, not per-connection. The Query packet's compression: bool field requests it for that single query. The server honours the request and emits compressed Data/Totals/Extremes/Log/ProfileEvents bodies for the lifetime of the query (Log/ProfileEvents only at v54481+). It also expects the client's outgoing Data blocks — external tables, the empty end-of-data marker, and INSERT rows — to be framed the same way. Subsequent queries on the same connection may differ.
Over HTTP there is no Query packet: the compress=1 query parameter selects framed output for that request, and decompress=1 declares that the request body is framed. The compress=1 output is written with the server's default codec (LZ4) rather than network_compression_method; the decompress=1 reader takes the codec from each frame's method byte, so any codec is accepted on input.
With compression on, the server may also route columns through the parallel block-marshalling / ColumnBLOB path (PARALLEL_BLOCK_MARSHALLING, v54478) for blocks with more than one row. An implementation that compresses INSERT data must be prepared to handle (or explicitly opt out of) that path to avoid a desynchronized stream.
Glossary
Block — the unit of data exchange in the Native format. A self-describing chunk of rows stored columnar. See block and column structure.
BlockInfo — the metadata header that precedes a Block on the TCP Data-packet path (written whenever the connection revision is greater than zero). A sequence of revision-gated, field-ID-tagged fields. Omitted by the Native output format, which serializes at revision 0. See BlockInfo.
Column body — the bytes of a Column that hold the actual values, after the column header (name, type, has_custom_serialization byte). Layout is type-specific. See column wire layout.
Composite type — a type built from one or more inner types, encoded as multiple streams per column. The wire format is stable and unversioned. See composite types.
Dictionary (LowCardinality) — the array of unique values that a LowCardinality(T) column references via integer indices. See LowCardinality.
Empty block — a Block with num_columns = 0 and num_rows = 0. Used as a sentinel: a client-side end-of-input marker and a server-side stream boundary marker. See block variants.
Header block — a Block with num_columns > 0 and num_rows = 0, sent by the server as the first Data packet of a query response. Announces the result schema. See block variants.
Inner type — the type a composite wraps. Array(UInt32) has inner type UInt32; Nullable(T)'s inner type is T.
Offsets stream — the cumulative-end-position UInt64 array that Array, Map, and Nested use to delimit per-row element boundaries. See Array.
Placeholder value — the bytes written at null positions in a Nullable(T) column's values stream. The decoder reads them to advance the stream but ignores their content. See Nullable.
Result block — a Block with num_rows > 0 carrying actual query result rows. See block variants.
Schema block — a synonym for header block, used when describing the INSERT phase, where the schema block tells the client the expected column shapes.
Serialization version — a per-type on-wire version number that versioned types use to declare which variant of the encoding follows. Distinct from the protocol version. See serialization version: concept.
State prefix — the bytes preceding the per-block payload of a versioned type. Carries the serialization version and (for LowCardinality) per-block dictionary metadata. Emitted at the start of every block with rows > 0; not retained across blocks.
Stream — a contiguous run of bytes within a column body, encoding one logical sub-component (a null-map, an offsets array, a values stream). Multi-stream types concatenate two or more streams per column.