How to read a MessagePack file

To read a MessagePack file from the command line, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then pass the format and the column types, and query the file directly:

1clickhouse local -q "SELECT * FROM file('events.msgpack', MsgPack,
2  'id UInt64, country String, event_type String, revenue Float64, quantity UInt8') LIMIT 5"

1┌─id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
21. │  1 │ GB      │ click      │       5 │        1 │
32. │  2 │ US      │ view       │    6.01 │        2 │
43. │  3 │ DE      │ purchase   │    7.02 │        3 │
54. │  4 │ FR      │ refund     │    8.03 │        4 │
65. │  5 │ IN      │ click      │    9.04 │        5 │
7   └────┴─────────┴────────────┴─────────┴──────────┘

MessagePack carries no column names or type information, so you supply the structure yourself in the file() call. The file is read in place with no import step.

The catch: MessagePack has no schema

Formats like Parquet, Avro or ORC embed their column names and types, so ClickHouse can read them with a bare file('data.parquet'). MessagePack does not. A .msgpack file is a flat stream of typed values with nothing that says which value is which column. (What is MessagePack?)

So a read without a structure fails, on purpose:

1clickhouse local -q "SELECT * FROM file('events.msgpack') LIMIT 3"

1Code: 636. DB::Exception: The table structure cannot be extracted from a MsgPack format file.
2You must specify setting input_format_msgpack_number_of_columns to extract table schema from MsgPack data.
3You can specify the structure manually. (CANNOT_EXTRACT_TABLE_STRUCTURE)

The fix is the three-argument form of file(): path, format, structure. You supply the column names and types in declaration order, and they must match how the file was written:

1clickhouse local -q "
2SELECT * FROM file('events.msgpack', MsgPack,
3  'id UInt64, country String, event_type String, revenue Float64, quantity UInt8')
4LIMIT 5
5FORMAT PrettyCompact"

That structure is the whole contract. Get the order or the types right and every value lands in the right column; get them wrong and you get garbage or a parse error, because there is no header to fall back on. If you only know the column count and not the names, set input_format_msgpack_number_of_columns and let ClickHouse name them c1, c2, ... instead.

Filter, aggregate, and group by

Once the structure is declared, the file behaves like any other SQL source. You have the full ClickHouse dialect (WHERE, GROUP BY, aggregate and window functions, joins) with no load step:

1clickhouse local -q "
2SELECT country,
3       count() AS purchases,
4       round(sum(revenue), 2) AS revenue,
5       round(avg(quantity), 3) AS avg_qty
6FROM file('events.msgpack', MsgPack,
7  'id UInt64, country String, event_type String, revenue Float64, quantity UInt8')
8WHERE event_type = 'purchase'
9GROUP BY country
10ORDER BY revenue DESC
11FORMAT PrettyCompact"

1┌─country─┬─purchases─┬─revenue─┬─avg_qty─┐
21. │ GB      │         2 │   34.24 │       3 │
32. │ DE      │         2 │   26.16 │       4 │
43. │ IN      │         1 │    15.1 │       1 │
5   └─────────┴───────────┴─────────┴─────────┘

Retyping that structure on every query gets old fast. Read the file once, write it to a self-describing format, and query that from then on. Converting to Parquet is a single command (SELECT ... INTO OUTFILE 'events.parquet' FORMAT Parquet), after which file('events.parquet') needs no structure at all. See how to query a Parquet file.

Compressed MessagePack works too

You do not unzip anything first. clickhouse local reads the .gz suffix and decompresses on the fly, so a .msgpack.gz file is queried exactly like a .msgpack (the structure argument is still required, since compression does not add a schema):

1clickhouse local -q "
2SELECT count(), round(sum(revenue), 2)
3FROM file('events.msgpack.gz', MsgPack,
4  'id UInt64, country String, event_type String, revenue Float64, quantity UInt8')
5FORMAT PrettyCompact"

1┌─count()─┬─round(sum(revenue), 2)─┐
21. │      20 │                  291.9 │
3   └─────────┴────────────────────────┘

The same applies to .msgpack.zst, .msgpack.lz4 and other supported codecs. The compression is inferred from the file name.

How fast is it?

MessagePack is a compact binary format, so there is no text to parse and no type to guess at read time. On a 3,000,000-row, ~80 MB file (events_large.msgpack), a filter-and-group-by over every row runs in:

1clickhouse local --time -q "
2SELECT country, count(), round(sum(revenue), 2), round(avg(quantity), 3)
3FROM file('events_large.msgpack', MsgPack,
4  'id UInt64, country String, event_type String, revenue Float64, quantity UInt8')
5WHERE event_type = 'purchase'
6GROUP BY country ORDER BY 2 DESC
7FORMAT Null"

~0.47 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). The decode runs across all cores, so a multi-million-row aggregation finishes before you can switch windows.

The same SQL scales unchanged

The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes; you swap the file() call for a table name and the rest stays put. You prototype against a MessagePack file on your machine and ship the identical logic to production.

Run it yourself

The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, and the 3M-row perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-messagepack

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-messagepack
3./generate.sh && ./run.sh

Prefer Python? → Read a MessagePack file in Python

The catch: MessagePack has no schema

Filter, aggregate, and group by

Compressed MessagePack works too

How fast is it?

The same SQL scales unchanged

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to read a MessagePack file

The catch: MessagePack has no schema #

Filter, aggregate, and group by #

Compressed MessagePack works too #

How fast is it? #

The same SQL scales unchanged #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

The catch: MessagePack has no schema

Filter, aggregate, and group by

Compressed MessagePack works too

How fast is it?

The same SQL scales unchanged

Run it yourself

Related