How to read a RowBinary file

Al Brown
Last updated: Jun 15, 2026

To read a RowBinary file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file and name the format:

1clickhouse local -q "SELECT * FROM file('events.rowbinary', 'RowBinaryWithNamesAndTypes') LIMIT 10"
1   ┌──────────event_time─┬─user_id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
21.2026-01-01 00:00:001 │ GB      │ click      │       5132.2026-01-01 00:02:172 │ US      │ view6.01243.2026-01-01 00:04:343 │ DE      │ purchase   │    7.0235   └─────────────────────┴─────────┴─────────┴────────────┴─────────┴──────────┘

RowBinaryWithNamesAndTypes embeds column names and types in a header, so the file reads itself with no schema declaration and no import step. Plain RowBinary carries only raw values and needs an explicit structure; both variants are covered below.

See the schema without declaring one #

RowBinary is a ClickHouse-native format, so it shows up most often when something already in the ClickHouse world wrote it: an INSERT ... FORMAT RowBinary, a client library, or a SELECT ... FORMAT RowBinaryWithNamesAndTypes dump. The WithNamesAndTypes variant prefixes the data with a header listing every column name and type. That header is the schema, so DESCRIBE reads it back without a CREATE TABLE:

1clickhouse local -q "DESCRIBE file('events.rowbinary', 'RowBinaryWithNamesAndTypes') FORMAT PrettyCompact"
1   ┌─name───────┬─type─────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ event_time │ DateTime │              │                    │         │                  │                │
32. │ user_id    │ UInt32   │              │                    │         │                  │                │
43. │ country    │ String   │              │                    │         │                  │                │
54. │ event_type │ String   │              │                    │         │                  │                │
65. │ revenue    │ Float64  │              │                    │         │                  │                │
76. │ quantity   │ UInt8    │              │                    │         │                  │                │
8   └────────────┴──────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Names and exact types come straight from the embedded header. DateTime stays DateTime, UInt32 stays UInt32. Unlike text formats, nothing is inferred or guessed, and nothing comes back Nullable unless the writer declared it so.

Filter, aggregate, and group by #

The file is a table, so the full SQL surface applies (WHERE, GROUP BY, aggregates, joins, window functions) with no load step in between:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(revenue),2) AS revenue, round(avg(quantity),2) AS avg_qty
3FROM file('events.rowbinary', 'RowBinaryWithNamesAndTypes')
4GROUP BY country
5ORDER BY revenue DESC
6FORMAT PrettyCompact"
1   ┌─country─┬─events─┬─revenue─┬─avg_qty─┐
21. │ US      │      4 │    60.4 │     3.5 │
32. │ GB      │      4 │   56.36 │     2.5 │
43. │ AU      │      3 │   48.33 │       2 │
54. │ IN      │      3 │    45.3 │    2.67 │
65. │ FR      │      3 │   42.27 │    3.33 │
76. │ DE      │      3 │   39.24 │       4 │
8   └─────────┴────────┴─────────┴─────────┘

Plain RowBinary has no header #

This is the part that trips people up. Plain RowBinary is just the column values, packed end to end with nothing describing them. There is no header to read, so schema inference is impossible by design and the command fails fast rather than guessing:

1clickhouse local -q "SELECT * FROM file('events_plain.rowbinary', 'RowBinary') LIMIT 1"
1Code: 36. DB::Exception: RowBinary file format doesn't support schema inference.
2You must specify the structure manually. (BAD_ARGUMENTS)

The fix is to pass the structure yourself, as the third argument to file(). The column order and types must match exactly what was written, byte for byte:

1clickhouse local -q "
2SELECT * FROM file('events_plain.rowbinary', 'RowBinary',
3  'event_time DateTime, user_id UInt32, country String, event_type String, revenue Float64, quantity UInt8')
4LIMIT 3 FORMAT PrettyCompact"
1   ┌──────────event_time─┬─user_id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
21.2026-01-01 00:00:001 │ GB      │ click      │       5132.2026-01-01 00:02:172 │ US      │ view6.01243.2026-01-01 00:04:343 │ DE      │ purchase   │    7.0235   └─────────────────────┴─────────┴─────────┴────────────┴─────────┴──────────┘

Same rows, same result. The only difference is that you supplied the schema instead of the file.

Why the self-describing variant is worth the bytes #

A plain RowBinary reader trusts whatever structure you type. Get a type wrong and there's no header to contradict you, so ClickHouse reinterprets the bytes and hands back garbage rather than an error. The WithNamesAndTypes header closes that gap: if the structure you pass disagrees with the embedded one, the read is rejected up front.

1clickhouse local -q "
2SELECT * FROM file('events.rowbinary', 'RowBinaryWithNamesAndTypes',
3  'event_time DateTime, user_id String, country String, event_type String, revenue Float64, quantity UInt8')
4LIMIT 1"
1Code: 117. DB::Exception: Type of 'user_id' must be String, not UInt32:
2(while reading header) ... (INCORRECT_DATA)

The header caught the mismatch (user_id is UInt32, not String) before reading a single data row. If you control the writer, prefer RowBinaryWithNamesAndTypes. It costs a few bytes of header and buys you a file that reads itself and validates on the way in. Reach for plain RowBinary only for the leanest machine-to-machine pipes where both ends already agree on the schema.

How fast is it? #

RowBinary is a fixed-layout binary format, so there is no text to parse and no types to infer per row. On a 3,000,000-row file (events_large.rowbinary, ~80 MB, generated by the example folder below), a full GROUP BY country with sum and avg over every row runs in:

1clickhouse local -q "
2SELECT country, count(), round(sum(revenue),2), round(avg(quantity),3)
3FROM file('events_large.rowbinary','RowBinaryWithNamesAndTypes')
4GROUP BY country ORDER BY 2 DESC
5FORMAT Null"

~0.26 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). RowBinary is row-oriented, so this reads every column off disk even when a query touches only a few. For repeated analytical scans, convert it once to a columnar format like Parquet and read that instead.

The same SQL scales unchanged #

The query you just ran on a laptop file is the same SQL you would run against a ClickHouse server or ClickHouse Cloud. Swap file('events.rowbinary', 'RowBinaryWithNamesAndTypes') for a table name and the SELECT ... GROUP BY stays put. RowBinary is in fact the wire format the ClickHouse client and many drivers use to ship rows, so reading it locally and querying a server are the same machinery, end to end. There is no separate local dialect to unlearn.

Run it yourself #

The complete, runnable example lives here. It has generate.sh (builds the self-describing file, a plain-RowBinary copy, and the 3M-row perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-rowbinary

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-rowbinary
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...