Compression
ClickHouse protocol supports data blocks compression with checksums.
Use
LZ4 if not sure what mode to pick.
Modes
|value
|name
|description
0x02
|None
|No compression, only checksums
0x82
|LZ4
|Extremely fast, good compression
0x90
|ZSTD
|Zstandard, pretty fast, best compression
Both LZ4 and ZSTD are made by same author, but with different tradeoffs. From facebook benchmarks:
|name
|ratio
|encoding
|decoding
|zstd 1.4.5 -1
|2.8
|500 MB/s
|1660 MB/s
|lz4 1.9.2
|2.1
|740 MB/s
|4530 MB/s
Block
|field
|type
|description
|checksum
|uint128
|Hash of (header + compressed data)
|raw_size
|uint32
|Raw size without header
|data_size
|uint32
|Uncompressed data size
|mode
|byte
|Compression mode
|compressed_data
|binary
|Block of compressed data
Header is (raw_size + data_size + mode), raw size consists of len(header + compressed_data).
Checksum is
hash(header + compressed_data), using ClickHouse CityHash.
None mode
If None mode is used,
compressed_data is equal to original data.
No compression mode is useful to ensure additional data integrity with checksums, because
hashing overhead is negligible.