How to convert MessagePack to CSV

Al Brown
Last updated: Jun 15, 2026

To convert MessagePack to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then supply the column structure and write the result to CSV:

1clickhouse local -q "
2SELECT * FROM file('orders.msgpack', MsgPack,
3  'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
4INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames"
1"order_date","order_id","country","product","revenue","quantity"
2"2026-01-01",1,"GB","widget",5,1
3"2026-01-02",2,"US","gadget",6.01,2
4"2026-01-03",3,"DE","gizmo",7.02,3
5"2026-01-04",4,"FR","doohickey",8.03,4
6"2026-01-05",5,"IN","widget",9.04,5

CSVWithNames writes a header row from your declared structure, and the types you supplied are carried through into the CSV. The file is read in place and streamed row by row, so the conversion works on files larger than RAM with no import step first.

The one gotcha: MessagePack has no schema #

Most formats clickhouse-local reads are self-describing. CSV has a header; Parquet, Avro and ORC embed their schema. MessagePack does not. It is a compact binary encoding of values with no column names and no top-level type information, so the reader cannot infer a table on its own. Ask it to, and it stops:

1Code: 636. DB::Exception: The table structure cannot be extracted from a MsgPack format file.
2Code: 36. DB::Exception: You must specify setting input_format_msgpack_number_of_columns
3to extract table schema from MsgPack data. (BAD_ARGUMENTS)

The fix is the three-argument form of file(): path, the MsgPack format, and an explicit structure string. Values are read positionally, in the order they were written, so the column order in your structure must match the order in the file. Get a name or a type wrong and you will mislabel a column rather than see an error, so this is the one line to get right.

Convert it #

With the structure supplied, the conversion is a single statement. SELECT from the MessagePack file, INTO OUTFILE as CSV:

1clickhouse local -q "
2SELECT * FROM file('orders.msgpack', MsgPack,
3  'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
4INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames"

CSVWithNames writes a header row from your structure; plain CSV omits it. TRUNCATE overwrites an existing file instead of appending. The first rows of the result:

1"order_date","order_id","country","product","revenue","quantity"
2"2026-01-01",1,"GB","widget",5,1
3"2026-01-02",2,"US","gadget",6.01,2
4"2026-01-03",3,"DE","gizmo",7.02,3
5"2026-01-04",4,"FR","doohickey",8.03,4
6"2026-01-05",5,"IN","widget",9.04,5

The types you declared are carried through: a Date becomes 2026-01-01, a Float64 keeps its decimals, integers stay unquoted. Read the CSV straight back to confirm the round-trip:

1clickhouse local -q "
2SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue
3FROM file('orders.csv')
4GROUP BY country ORDER BY revenue DESC"
1US	4	60.4
2GB	4	56.36
3AU	3	48.33
4IN	3	45.3
5FR	3	42.27
6DE	3	39.24

Options worth knowing #

This is the information an upload-required converter site cannot give you, because the structure and the output shape are yours to control:

  • Getting the structure right. If you do not know the column types, sample the file first. Read it once with all columns typed as String and LIMIT 5 to eyeball the values, then tighten the types. The full read-side detail lives in how to read a MessagePack file.
  • Nested values flatten. MessagePack can hold maps and arrays. CSV is flat, so a nested value declared as String is written as its serialized text in one cell. There is no column-per-key expansion; that is a property of CSV, not of the tool. Pick a richer target if you need to keep structure.
  • No header, or a different delimiter. Use FORMAT CSV to drop the header row, or write TSV instead with FORMAT TSVWithNames for tab-separated output.
  • Project or reshape on the way out. Because it is SELECT, you can drop columns, rename them, filter with WHERE, or aggregate before writing. The CSV is the query result, not a blind dump.
  • Larger than RAM. The conversion streams row by row, so file size is bounded by disk, not memory.

In Python with chDB #

If you would rather stay in Python, chDB is the same ClickHouse engine in-process. The SQL is identical, including the explicit structure and the INTO OUTFILE:

1import chdb
2
3chdb.query("""
4SELECT * FROM file('orders.msgpack', MsgPack,
5  'order_date Date, order_id UInt64, country String, product String, revenue Float64, quantity UInt8')
6INTO OUTFILE 'orders.csv' TRUNCATE FORMAT CSVWithNames
7""")

No pandas or pyarrow round-trip is needed to land the CSV. For the read-side workflow into a DataFrame, see how to read a MessagePack file in Python with chDB.

How fast is it? #

On a 3,000,000-row orders_large.msgpack, decoding every row and writing a ~132 MB CSV completes in:

1run 1: real 0.62
2run 2: real 0.69
3run 3: real 0.58

~0.58 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number includes decoding the MessagePack and formatting the CSV text from scratch; there is no cached table.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample .msgpack files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-messagepack-to-csv

The same SQL that converts one file on your laptop runs unchanged against a folder of files, a remote object store, and a ClickHouse server or ClickHouse Cloud when the data outgrows one machine, with no rewrite.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...