How to read a ClickHouse Native format file

To read a ClickHouse Native file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then query it with SELECT:

1clickhouse local -q "SELECT * FROM file('events.native') LIMIT 5"

1┌──────────event_time─┬─user_id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
21. │ 2026-01-01 00:00:00 │       1 │ GB      │ click      │       5 │        1 │
32. │ 2026-01-01 01:00:07 │       2 │ US      │ purchase   │    6.01 │        2 │
43. │ 2026-01-01 02:00:14 │       3 │ DE      │ refund     │    7.02 │        3 │
54. │ 2026-01-01 03:00:21 │       4 │ FR      │ signup     │    8.02 │        4 │
65. │ 2026-01-01 04:00:28 │       5 │ IN      │ click      │    9.03 │        5 │
7   └─────────────────────┴─────────┴─────────┴────────────┴─────────┴──────────┘

Native is ClickHouse's own binary columnar format. It stores every column's exact type alongside the data, so the file is read in place with no import step and no schema to declare.

The schema is in the file, exactly

This is what sets Native apart from text formats. A CSV or JSON file has values but no types, so any reader has to guess: is 00123 a number or a string, is 1.0 a float or a decimal? Native stores the precise type of every column in the file itself. DESCRIBE reads it back without a single guess:

1clickhouse local -q "DESCRIBE file('events.native') FORMAT PrettyCompact"

1┌─name───────┬─type───────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ event_time │ DateTime       │              │                    │         │                  │                │
32. │ user_id    │ UInt32         │              │                    │         │                  │                │
43. │ country    │ String         │              │                    │         │                  │                │
54. │ event_type │ String         │              │                    │         │                  │                │
65. │ revenue    │ Decimal(10, 2) │              │                    │         │                  │                │
76. │ quantity   │ UInt8          │              │                    │         │                  │                │
8   └────────────┴────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

user_id is a UInt32, not a generic 64-bit integer. revenue is a Decimal(10, 2), not a lossy float. quantity is a one-byte UInt8. These are the original types, round-tripped intact.

Write the same 20 rows out as CSV and ask for its schema, and you see what inference costs you:

1clickhouse local -q "DESCRIBE file('events.csv') FORMAT PrettyCompact"

1┌─name───────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ event_time │ Nullable(DateTime) │              │                    │         │                  │                │
32. │ user_id    │ Nullable(Int64)    │              │                    │         │                  │                │
43. │ country    │ Nullable(String)   │              │                    │         │                  │                │
54. │ event_type │ Nullable(String)   │              │                    │         │                  │                │
65. │ revenue    │ Nullable(Float64)  │              │                    │         │                  │                │
76. │ quantity   │ Nullable(Int64)    │              │                    │         │                  │                │
8   └────────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Everything widened to Int64 or Float64 and went Nullable, and the Decimal became a float. CSV did its honest best from text, but the precise types were gone the moment they were written as strings. Native never throws that information away.

Aggregate directly on the file

Because the file is a SQL source, the full ClickHouse dialect works on it (WHERE, GROUP BY, aggregate functions, window functions, joins) with no load step in between:

1clickhouse local -q "
2SELECT country,
3       count()              AS events,
4       sum(revenue)         AS revenue,
5       round(avg(quantity), 2) AS avg_qty
6FROM file('events.native')
7GROUP BY country
8ORDER BY revenue DESC
9FORMAT PrettyCompact"

1┌─country─┬─events─┬─revenue─┬─avg_qty─┐
21. │ US      │      4 │    60.4 │     3.5 │
32. │ GB      │      4 │   56.36 │     2.5 │
43. │ AU      │      3 │   48.33 │       2 │
54. │ IN      │      3 │   45.29 │    2.67 │
65. │ FR      │      3 │   42.25 │    3.33 │
76. │ DE      │      3 │   39.24 │       4 │
8   └─────────┴────────┴─────────┴─────────┘

Because revenue arrives as a Decimal, that sum is computed in exact decimal arithmetic, not floating point. The type fidelity is not cosmetic; it changes the answer.

Native is the wire format, so it pipes

Native is the same encoding ClickHouse uses to move data between a client and a server, and to dump and reload tables. That means one process can emit Native and another can read it directly, with the schema travelling in the byte stream. No intermediate file, no column list:

1clickhouse local -q "SELECT number AS id, (number * number)::UInt64 AS sq FROM numbers(3) FORMAT Native" \
2  | clickhouse local --input-format Native -q "SELECT * FROM table FORMAT PrettyCompact"

1┌─id─┬─sq─┐
21. │  0 │  0 │
32. │  1 │  1 │
43. │  2 │  4 │
5   └────┴────┘

The second process never sees a CREATE TABLE or a type hint. It reads id UInt64, sq UInt64 straight out of the stream. This is why Native is the natural choice for staging a table dump (clickhouse-client --query "SELECT * FROM t FORMAT Native" > t.native) and reloading it elsewhere: the round-trip is byte-for-byte faithful.

Read a compressed Native file transparently

You do not unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so a .native.gz file is queried exactly like a .native:

1clickhouse local -q "SELECT count(), sum(revenue) FROM file('events.native.gz') FORMAT PrettyCompact"

1┌─count()─┬─sum(revenue)─┐
21. │      20 │       291.87 │
3   └─────────┴──────────────┘

The same applies to .native.zst, .native.lz4, and other supported codecs. The compression is inferred from the file name.

How fast is it?

Native is the lowest-overhead format ClickHouse reads: the bytes on disk are close to the in-memory column layout, so there is little to parse and nothing to infer. On a 3,000,000-row, ~78 MB file (events_large.native), a full GROUP BY country with sum and avg over every row runs in:

1clickhouse local --time -q "
2SELECT country, count(), sum(revenue), round(avg(quantity), 3)
3FROM file('events_large.native')
4GROUP BY country ORDER BY 3 DESC
5FORMAT Null"

~0.17 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). The columnar layout and the absence of any type inference are exactly why it is quick.

For long-term storage you would usually reach for a compressed columnar format like Parquet instead, which is smaller on disk and portable across many tools. Native earns its place when you want maximum read speed and a perfect type round-trip inside the ClickHouse world: client-to-server transfer, table dumps, and handing data between two clickhouse local invocations.

The same SQL scales unchanged

The query you just ran on a laptop file is the same SQL you would run against a ClickHouse server or ClickHouse Cloud. Nothing about SELECT ... GROUP BY changes. You swap file('events.native') for a table name and the rest stays put. Because Native is the server's own format, that move is exact: a table you SELECT ... FORMAT Native from the server is the same bytes you read here.

Run it yourself

The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, a CSV twin for the type comparison, and the 3M-row perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-native

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-native
3./generate.sh && ./run.sh

The schema is in the file, exactly

Aggregate directly on the file

Native is the wire format, so it pipes

Read a compressed Native file transparently

How fast is it?

The same SQL scales unchanged

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to read a ClickHouse Native format file

The schema is in the file, exactly #

Aggregate directly on the file #

Native is the wire format, so it pipes #

Read a compressed Native file transparently #

How fast is it? #

The same SQL scales unchanged #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

The schema is in the file, exactly

Aggregate directly on the file

Native is the wire format, so it pipes

Read a compressed Native file transparently

How fast is it?

The same SQL scales unchanged

Run it yourself

Related