To read an Avro file from the command line and query its rows, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then query the file directly:
clickhouse local -q "SELECT * FROM file('events.avro') LIMIT 10"
┌─event_date─┬──────────────event_time─┬─event_id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
1. │ 2026-01-01 │ 2026-01-01 00:00:00.000 │ 1 │ GB │ click │ 5 │ 1 │
2. │ 2026-01-02 │ 2026-01-01 00:02:17.000 │ 2 │ US │ view │ 6.01 │ 2 │
3. │ 2026-01-03 │ 2026-01-01 00:04:34.000 │ 3 │ DE │ purchase │ 7.02 │ 3 │
└────────────┴─────────────────────────┴──────────┴─────────┴────────────┴─────────┴──────────┘
The format is detected from the .avro extension. Avro carries its schema inside the file, so the file is read in place with no import step and no structure to declare.
Avro is self-describing: every file begins with a header that holds its writer schema as JSON. That is the key difference from a format like CSV, where types are guessed from the data. Here the types are read, not inferred, so there is nothing to declare. DESCRIBE prints them:
clickhouse local -q "DESCRIBE file('events.avro') FORMAT PrettyCompact"
┌─name───────┬─type──────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ event_date │ Date32 │ │ │ │ │ │
2. │ event_time │ DateTime64(3) │ │ │ │ │ │
3. │ event_id │ Int32 │ │ │ │ │ │
4. │ country │ String │ │ │ │ │ │
5. │ event_type │ String │ │ │ │ │ │
6. │ revenue │ Float64 │ │ │ │ │ │
7. │ quantity │ Int32 │ │ │ │ │ │
└────────────┴───────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
(The empty columns are CREATE TABLE metadata such as defaults, codecs and TTLs, which an Avro file doesn't carry, so they come back blank.)
Logical types: dates and timestamps stay typed
Avro's base types are deliberately small: int, long, float, double, bytes, string, boolean. Dates and timestamps ride on top as logical types, which are annotations layered over a base type. A date is an int tagged date; a millisecond timestamp is a long tagged timestamp-millis. clickhouse-local reads those annotations, so the columns come back as real Date32 and DateTime64(3) rather than bare integers:
clickhouse local -q "
SELECT event_date, toTypeName(event_date) AS date_type,
event_time, toTypeName(event_time) AS time_type
FROM file('events.avro') LIMIT 3 FORMAT PrettyCompact"
┌─event_date─┬─date_type─┬──────────────event_time─┬─time_type─────┐
1. │ 2026-01-01 │ Date32 │ 2026-01-01 00:00:00.000 │ DateTime64(3) │
2. │ 2026-01-02 │ Date32 │ 2026-01-01 00:02:17.000 │ DateTime64(3) │
3. │ 2026-01-03 │ Date32 │ 2026-01-01 00:04:34.000 │ DateTime64(3) │
└────────────┴───────────┴─────────────────────────┴───────────────┘
This matters because it is the common failure mode of weaker Avro readers: a timestamp that arrives as a raw 1767225600 instead of a date you can filter on. Here you can write WHERE event_date >= '2026-06-01' or toStartOfMonth(event_time) against the file with no casting. If a producer wrote a timestamp as a plain long with no logical-type tag, ClickHouse reads it as the integer it is, and a toDateTime64() in your query restores it. (What is an Avro file?)
Filter, aggregate, and group by
Because the file is a SQL source, the full ClickHouse dialect works on it (WHERE, GROUP BY, aggregate functions, window functions, joins), not a fixed table UI:
clickhouse local -q "
SELECT country,
count() AS events,
round(sum(revenue), 2) AS revenue,
round(avg(quantity), 2) AS avg_qty
FROM file('events.avro')
WHERE event_type = 'purchase'
GROUP BY country
ORDER BY revenue DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬─revenue─┬─avg_qty─┐
1. │ GB │ 2 │ 34.24 │ 3 │
2. │ DE │ 2 │ 26.16 │ 4 │
3. │ IN │ 1 │ 15.1 │ 1 │
└─────────┴────────┴─────────┴─────────┘
Avro is row-oriented, so a GROUP BY reads the whole file rather than a few columns the way a columnar format would. That is fine for the everyday case. It is how Avro is built, and it is what makes it a good fit for streaming and append-heavy pipelines. clickhouse-local reads Avro with one binary that also reads CSV, JSON, ORC, Parquet and many other formats, and runs the same SQL unchanged when you move from a file to a server.
The conversion runs the other way too. SELECT from any source and write Avro with FORMAT Avro, and ClickHouse emits the matching logical types automatically (a Date becomes date, a DateTime64(3) becomes timestamp-millis):
clickhouse local -q "
SELECT toDate('2026-01-01') AS event_date,
toDateTime64('2026-01-01 00:00:00.000', 3) AS event_time,
'GB' AS country
INTO OUTFILE 'out.avro' TRUNCATE FORMAT Avro"
DESCRIBE file('out.avro') then shows event_date back as Date32 and event_time as DateTime64(3) — the annotations survive the round trip. That is exactly how the sample file in the Run it yourself folder is built.
Small files are instant in anything; the difference shows up at scale. On a 3,000,000-row, ~61 MB Avro file (events_large.avro, generated by the example folder below), a filter-and-group-by across every row runs in:
clickhouse local --time -q "
SELECT country, count(), round(sum(revenue), 2), round(avg(quantity), 3)
FROM file('events_large.avro')
WHERE event_type = 'purchase'
GROUP BY country ORDER BY 2 DESC
FORMAT Null"
~0.56 seconds, best of three with the file warm in the OS page cache, measured on an Apple-silicon laptop (Apple M4 Pro, 14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes decoding the Avro container from scratch on every run — there is no cached table. The query runs across all cores, so a multi-million-row aggregation finishes before you can switch windows.
The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.avro') for a table name and the rest stays put. There is no separate "local dialect" to unlearn: you prototype against a file on your machine and ship the identical logic to production.
The complete, runnable example lives here. It has generate.sh (builds the demo Avro file and the ~61 MB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-avro
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-avro
./generate.sh && ./run.sh