Compression
ClickHouse protocol supports data blocks compression with checksums.
Use LZ4
if not sure what mode to pick.
Modes
value | name | description |
---|---|---|
0x02 | None | No compression, only checksums |
0x82 | LZ4 | Extremely fast, good compression |
0x90 | ZSTD | Zstandard, pretty fast, best compression |
Both LZ4 and ZSTD are made by same author, but with different tradeoffs. From facebook benchmarks:
name | ratio | encoding | decoding |
---|---|---|---|
zstd 1.4.5 -1 | 2.8 | 500 MB/s | 1660 MB/s |
lz4 1.9.2 | 2.1 | 740 MB/s | 4530 MB/s |
Block
field | type | description |
---|---|---|
checksum | uint128 | Hash of (header + compressed data) |
raw_size | uint32 | Raw size without header |
data_size | uint32 | Uncompressed data size |
mode | byte | Compression mode |
compressed_data | binary | Block of compressed data |
Header is (raw_size + data_size + mode), raw size consists of len(header + compressed_data).
Checksum is hash(header + compressed_data)
, using ClickHouse CityHash.
None mode
If None mode is used, compressed_data
is equal to original data.
No compression mode is useful to ensure additional data integrity with checksums, because
hashing overhead is negligible.