To query a Parquet file from the command line, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then query the file directly:
clickhouse local -q "SELECT * FROM file('data.parquet') LIMIT 10"
┌──────────────event_time─┬─user_id─┬─country─┬─device──┬─event_type─┬─revenue─┬─quantity─┐
1. │ 2026-02-03 10:41:38.000 │ 219394 │ IN │ desktop │ refund │ 147.26 │ 2 │
2. │ 2026-05-18 10:43:26.000 │ 364848 │ IN │ tablet │ click │ 158.77 │ 5 │
3. │ 2026-03-24 12:00:39.000 │ 392522 │ GB │ mobile │ refund │ 274.44 │ 1 │
└─────────────────────────┴─────────┴─────────┴─────────┴────────────┴─────────┴──────────┘
The format is detected from the .parquet extension, and the schema is read in place from the file's embedded metadata with no import step first.
Prefer Python? See How to read a Parquet file in Python (and query it with SQL) for the same queries against a pandas DataFrame.
Parquet files carry their own schema, so you never write CREATE TABLE. DESCRIBE prints the column names and the types ClickHouse inferred:
clickhouse local -q "DESCRIBE file('data.parquet') FORMAT PrettyCompact"
┌─name───────┬─type─────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ event_time │ DateTime64(3, 'UTC') │ │ │ │ │ │
2. │ user_id │ UInt32 │ │ │ │ │ │
3. │ country │ String │ │ │ │ │ │
4. │ device │ String │ │ │ │ │ │
5. │ event_type │ String │ │ │ │ │ │
6. │ revenue │ Float64 │ │ │ │ │ │
7. │ quantity │ UInt8 │ │ │ │ │ │
└────────────┴──────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
(The extra columns are for CREATE TABLE metadata - defaults, codecs, TTLs - which a Parquet file doesn't carry, so they come back empty.)
Timestamps come back as DateTime64, integers keep their width, and decimals land as Float64, all read straight from the file's embedded metadata. (What is a Parquet file?)
Filter, aggregate, and group by
A file viewer lets you scroll through rows; a query engine lets you ask questions of them. Because the file is just a SQL source, you have the full ClickHouse dialect (WHERE, GROUP BY, aggregate functions, window functions, joins), not a fixed table UI:
clickhouse local -q "
SELECT country,
count() AS purchases,
round(sum(revenue), 2) AS revenue,
round(avg(quantity), 3) AS avg_qty
FROM file('data.parquet')
WHERE event_type = 'purchase'
GROUP BY country
ORDER BY revenue DESC
LIMIT 5
FORMAT PrettyCompact"
┌─country─┬─purchases─┬───revenue─┬─avg_qty─┐
1. │ BR │ 2086 │ 526569.59 │ 3.015 │
2. │ FR │ 2027 │ 512719.52 │ 2.975 │
3. │ IN │ 2044 │ 508346.74 │ 3.05 │
4. │ NL │ 2032 │ 505167.61 │ 3.036 │
5. │ JP │ 1998 │ 502163.23 │ 3.009 │
└─────────┴───────────┴───────────┴─────────┘
Parquet is columnar, so a query that touches three columns reads only those three off disk. The other columns are never decoded, which is why this scales (more on that below).
ClickHouse covers this with one binary that also reads CSV, JSON, ORC, Arrow and many other formats, talks to S3, MySQL, Postgres and more, and runs the same SQL unchanged when you move from a file to a server to the Cloud.
Parquet supports per-column compression (gzip, zstd, snappy) inside the file. You do not pass a flag or decompress anything first. clickhouse local reads the codec from the file metadata:
clickhouse local -q "SELECT count(), round(sum(revenue), 2) FROM file('data.zstd.parquet') FORMAT PrettyCompact"
┌─count()─┬─round(sum(revenue), 2)─┐
1. │ 100000 │ 24957233.46 │
└─────────┴────────────────────────┘
The same command works on a gzip-compressed Parquet file, or on a Parquet file wrapped in an outer .parquet.gz. The extension and the embedded metadata tell ClickHouse what to do.
Small files are instant in anything. The difference shows up at scale. On a 977 MB Parquet file of 70 million rows (events_large.parquet, generated by the example folder below), the same filter-and-group-by query, now pointed at the large file, runs in:
clickhouse local --time -q "
SELECT country, count(), round(sum(revenue), 2), round(avg(quantity), 3)
FROM file('events_large.parquet')
WHERE event_type = 'purchase'
GROUP BY country ORDER BY 3 DESC
FORMAT Null"
0.44 seconds, best of three with the file warm in the OS page cache, measured on an Apple-silicon laptop (Apple M4 Pro, 14 cores, 24 GB RAM; clickhouse local 26.6.1.117). The query reads only the columns it needs and runs across all cores, so a near-gigabyte aggregation finishes before you can switch windows.
The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('data.parquet') for a table name (or an s3() function pointing at a bucket) and the rest stays put. There is no separate "local dialect" to unlearn: you prototype against a file on your machine and ship the identical logic to production.
The complete, runnable example lives here. It has generate.sh (builds the demo file, a zstd-compressed copy, and the ~1 GB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-parquet
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-parquet
./generate.sh && ./run.sh