To query a compressed file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the compressed file exactly as you would the uncompressed version:
clickhouse local -q "SELECT * FROM file('events.csv.gz') LIMIT 5"
┌──────────event_time─┬─user_id─┬─country─┬─device──┬─event_type─┬─revenue─┬─quantity─┐
1. │ 2026-06-18 11:41:38 │ 1 │ IN │ mobile │ view │ 497.89 │ 2 │
2. │ 2026-12-24 10:43:26 │ 2 │ CA │ tablet │ view │ 204.51 │ 3 │
3. │ 2026-07-22 13:00:39 │ 3 │ DE │ mobile │ purchase │ 64.19 │ 5 │
4. │ 2026-08-13 01:02:57 │ 4 │ FR │ desktop │ view │ 103.71 │ 2 │
5. │ 2026-01-20 00:59:20 │ 5 │ JP │ desktop │ purchase │ 345.78 │ 3 │
└─────────────────────┴─────────┴─────────┴─────────┴────────────┴─────────┴──────────┘
There is no gunzip step and no decompression flag. A .gz, .zst, .lz4, .br, or .xz suffix is detected from the file name and the file is decompressed in place as the data streams in, with no import step first.
Compression sits underneath the format. ClickHouse decompresses first, then reads the .csv (or .json, .parquet, ...) inside, so type inference works exactly as it does on a plain file. DESCRIBE proves it without a CREATE TABLE:
clickhouse local -q "DESCRIBE file('events.csv.gz') FORMAT PrettyCompact"
┌─name───────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ event_time │ Nullable(DateTime) │ │ │ │ │ │
2. │ user_id │ Nullable(Int64) │ │ │ │ │ │
3. │ country │ Nullable(String) │ │ │ │ │ │
4. │ device │ Nullable(String) │ │ │ │ │ │
5. │ event_type │ Nullable(String) │ │ │ │ │ │
6. │ revenue │ Nullable(Float64) │ │ │ │ │ │
7. │ quantity │ Nullable(Int64) │ │ │ │ │ │
└────────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
The header gave the names, the data gave the types, and the gzip layer was invisible. (The extra empty columns are CREATE TABLE metadata that a CSV does not carry.)
Swap the suffix and the command does not change. A zstd-compressed CSV reads identically:
clickhouse local -q "
SELECT event_type, count()
FROM file('events.csv.zst')
GROUP BY event_type
ORDER BY event_type
FORMAT PrettyCompact"
┌─event_type─┬─count()─┐
1. │ click │ 6 │
2. │ purchase │ 5 │
3. │ refund │ 3 │
4. │ view │ 6 │
└────────────┴─────────┘
.lz4, .br (brotli), and .xz behave the same. The codec is chosen by the file extension, never by a command-line switch.
Parquet compresses each column block internally (zstd, snappy, gzip). That codec is part of the file's footer, so a zstd-compressed Parquet file is read with no hint at all. There is no .zst suffix involved; the .parquet extension is enough:
clickhouse local -q "SELECT count(), round(sum(revenue), 2) FROM file('events.zstd.parquet') FORMAT PrettyCompact"
┌─count()─┬─round(sum(revenue), 2)─┐
1. │ 20 │ 4838.95 │
└─────────┴────────────────────────┘
When you write Parquet yourself, pick the column codec with one setting: SELECT ... INTO OUTFILE 'out.parquet' FORMAT Parquet SETTINGS output_format_parquet_compression_method = 'zstd'. See what is a Parquet file for how the columnar layout and per-column compression fit together.
Auto-detection is the happy path, but it leans entirely on the file name. A gzipped CSV saved as events.bin has nothing for ClickHouse to go on, so inference fails:
Code: 636. DB::Exception: The table structure cannot be extracted from a CSVWithNames format file.
Pass the compression method as the fourth argument to file(path, format, structure, compression) and it reads fine:
clickhouse local -q "
SELECT count() FROM file('events.bin', 'CSVWithNames',
'event_time DateTime, user_id Int64, country String, device String, event_type String, revenue Float64, quantity UInt8',
'gzip')"
The accepted values are gzip, zstd, lz4, brotli, xz, none, and auto (the default). This is the one case where you type the codec explicitly; everything else infers it.
Because the decompressed file is just a SQL source, the full dialect works on it (WHERE, GROUP BY, joins, window functions) with no import step:
clickhouse local -q "
SELECT country, count() AS events, round(sum(revenue), 2) AS revenue
FROM file('events.csv.gz')
WHERE event_type = 'purchase'
GROUP BY country
ORDER BY revenue DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬─revenue─┐
1. │ AU │ 1 │ 394.06 │
2. │ FR │ 1 │ 393.05 │
3. │ JP │ 1 │ 345.78 │
4. │ DE │ 1 │ 64.19 │
5. │ IN │ 1 │ 28.68 │
└─────────┴────────┴─────────┘
ClickHouse covers this with one binary that reads CSV, JSON, Parquet, ORC, Arrow and more, each transparently compressed, and runs the same SQL against a server or the Cloud when the data grows.
Decompression costs CPU, but it is not the bottleneck. On a 3,000,000-row gzipped CSV (events_large.csv.gz, ~49 MiB on disk, ~173 MiB uncompressed), a filter-and-group-by across every row runs in:
clickhouse local -q "
SELECT country, count() AS events, round(sum(revenue),2) AS revenue, round(avg(quantity),3) AS avg_qty
FROM file('events_large.csv.gz')
WHERE event_type='purchase'
GROUP BY country ORDER BY revenue DESC"
~0.55 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That decodes the gzip from scratch on every run and parses the CSV text underneath it; there is no cached table. Decompression overlaps with parsing and aggregation across cores, so it disappears into the work you were doing anyway.
The query you just ran on a compressed file on your laptop is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Swap file('events.csv.gz') for a table name, or an s3() function pointing at compressed objects in a bucket, and nothing else changes. You prototype against a compressed file on disk and ship the identical logic to production.
The complete, runnable example lives here. It has generate.sh (builds the gzipped CSV, the zstd CSV, the zstd Parquet, and the ~49 MiB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-compressed
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-compressed
./generate.sh && ./run.sh