To convert MessagePack to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then supply the column structure and write the result to CSV:
clickhouse local -q "
SELECT * FROM file('orders.msgpack', MsgPack,
'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames"
"order_date","order_id","country","product","revenue","quantity"
"2026-01-01",1,"GB","widget",5,1
"2026-01-02",2,"US","gadget",6.01,2
"2026-01-03",3,"DE","gizmo",7.02,3
"2026-01-04",4,"FR","doohickey",8.03,4
"2026-01-05",5,"IN","widget",9.04,5
CSVWithNames writes a header row from your declared structure, and the types you supplied are carried through into the CSV. The file is read in place and streamed row by row, so the conversion works on files larger than RAM with no import step first.
Most formats clickhouse-local reads are self-describing. CSV has a header; Parquet, Avro and ORC embed their schema. MessagePack does not. It is a compact binary encoding of values with no column names and no top-level type information, so the reader cannot infer a table on its own. Ask it to, and it stops:
Code: 636. DB::Exception: The table structure cannot be extracted from a MsgPack format file.
Code: 36. DB::Exception: You must specify setting input_format_msgpack_number_of_columns
to extract table schema from MsgPack data. (BAD_ARGUMENTS)
The fix is the three-argument form of file(): path, the MsgPack format, and an explicit structure string. Values are read positionally, in the order they were written, so the column order in your structure must match the order in the file. Get a name or a type wrong and you will mislabel a column rather than see an error, so this is the one line to get right.
With the structure supplied, the conversion is a single statement. SELECT from the MessagePack file, INTO OUTFILE as CSV:
clickhouse local -q "
SELECT * FROM file('orders.msgpack', MsgPack,
'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames"
CSVWithNames writes a header row from your structure; plain CSV omits it. TRUNCATE overwrites an existing file instead of appending. The first rows of the result:
"order_date","order_id","country","product","revenue","quantity"
"2026-01-01",1,"GB","widget",5,1
"2026-01-02",2,"US","gadget",6.01,2
"2026-01-03",3,"DE","gizmo",7.02,3
"2026-01-04",4,"FR","doohickey",8.03,4
"2026-01-05",5,"IN","widget",9.04,5
The types you declared are carried through: a Date becomes 2026-01-01, a Float64 keeps its decimals, integers stay unquoted. Read the CSV straight back to confirm the round-trip:
clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue
FROM file('orders.csv')
GROUP BY country ORDER BY revenue DESC"
US 4 60.4
GB 4 56.36
AU 3 48.33
IN 3 45.3
FR 3 42.27
DE 3 39.24
This is the information an upload-required converter site cannot give you, because the structure and the output shape are yours to control:
- Getting the structure right. If you do not know the column types, sample the file first. Read it once with all columns typed as
String and LIMIT 5 to eyeball the values, then tighten the types. The full read-side detail lives in how to read a MessagePack file.
- Nested values flatten. MessagePack can hold maps and arrays. CSV is flat, so a nested value declared as
String is written as its serialized text in one cell. There is no column-per-key expansion; that is a property of CSV, not of the tool. Pick a richer target if you need to keep structure.
- No header, or a different delimiter. Use
FORMAT CSV to drop the header row, or write TSV instead with FORMAT TSVWithNames for tab-separated output.
- Project or reshape on the way out. Because it is
SELECT, you can drop columns, rename them, filter with WHERE, or aggregate before writing. The CSV is the query result, not a blind dump.
- Larger than RAM. The conversion streams row by row, so file size is bounded by disk, not memory.
If you would rather stay in Python, chDB is the same ClickHouse engine in-process. The SQL is identical, including the explicit structure and the INTO OUTFILE:
import chdb
chdb.query("""
SELECT * FROM file('orders.msgpack', MsgPack,
'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames
""")
No pandas or pyarrow round-trip is needed to land the CSV. For the read-side workflow into a DataFrame, see how to read a MessagePack file in Python with chDB.
On a 3,000,000-row orders_large.msgpack, decoding every row and writing a ~132 MB CSV completes in:
run 1: real 0.62
run 2: real 0.69
run 3: real 0.58
~0.58 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number includes decoding the MessagePack and formatting the CSV text from scratch; there is no cached table.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample .msgpack files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-messagepack-to-csv
The same SQL that converts one file on your laptop runs unchanged against a folder of files, a remote object store, and a ClickHouse server or ClickHouse Cloud when the data outgrows one machine, with no rewrite.