Skip to main content

Compression

ClickHouse protocol supports data blocks compression with checksums. Use LZ4 if not sure what mode to pick.

tip

Learn more about the column compression codecs available and specify them when creating your tables, or afterward.

Modes

valuenamedescription
0x02NoneNo compression, only checksums
0x82LZ4Extremely fast, good compression
0x90ZSTDZstandard, pretty fast, best compression

Both LZ4 and ZSTD are made by same author, but with different tradeoffs. From facebook benchmarks:

nameratioencodingdecoding
zstd 1.4.5 -12.8500 MB/s1660 MB/s
lz4 1.9.22.1740 MB/s4530 MB/s

Block

fieldtypedescription
checksumuint128Hash of (header + compressed data)
raw_sizeuint32Raw size without header
data_sizeuint32Uncompressed data size
modebyteCompression mode
compressed_databinaryBlock of compressed data

compression block diagram

Header is (raw_size + data_size + mode), raw size consists of len(header + compressed_data).

Checksum is hash(header + compressed_data), using ClickHouse CityHash.

None mode

If None mode is used, compressed_data is equal to original data. No compression mode is useful to ensure additional data integrity with checksums, because hashing overhead is negligible.