How to convert Parquet to CSV

Al Brown
Last updated: Jun 8, 2026

To convert Parquet to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the Parquet file out as CSV:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"
1"event_date","event_id","country","action","amount","attrs"
2"2026-01-01",1,"GB","click",5,"{'os':'mac','plan':'free'}"
3"2026-01-02",2,"US","view",6.01,"{'os':'win','plan':'pro'}"
4"2026-01-03",3,"DE","signup",7.02,"{'os':'linux','plan':'free'}"

The schema is read from the Parquet footer, so there is nothing to declare. CSVWithNames writes a header row from those column names, and the typed Parquet values become text in the CSV, all in place with no upload or import step.

Parquet stores its own schema, so there's nothing to declare. ClickHouse reads it from the footer. Check what it found with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.parquet')"
1event_date	Date32
2event_id	UInt64
3country	String
4action	String
5amount	Float64
6attrs	Map(String, String)

Those types are what get written to the CSV as text. A Date32 becomes 2026-01-01, a Float64 becomes 6.01. CSV itself carries no types, so the column types live in the Parquet file and are lost the moment you write text. That's expected for CSV; if you need the types preserved, keep the Parquet or convert to a typed format instead.

A quick round-trip check confirms every row made it across:

1clickhouse local -q "SELECT count() FROM file('events.csv')"
120

OPTIONS: header, delimiter, nested columns #

This is where a scriptable converter beats an upload-and-download web tool. You control exactly what the CSV looks like.

CSVWithNames writes the header row shown above. If you want a headerless CSV, use FORMAT CSV instead:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events_noheader.csv' FORMAT CSV"

CSV does not have to mean comma. Set format_csv_delimiter to write a semicolon-separated file, which is what many European locales and Excel installs expect:

1clickhouse local -q "
2SELECT event_date, country, amount FROM file('events.parquet')
3INTO OUTFILE 'events_semi.csv' FORMAT CSVWithNames
4SETTINGS format_csv_delimiter=';'"
1"event_date";"country";"amount"
2"2026-01-01";"GB";5
3"2026-01-02";"US";6.01

Nested columns are the one thing to watch when going Parquet to CSV. Parquet supports nested types (here attrs is a Map(String, String)); CSV is flat. A nested column is serialized into a single text cell:

11	{'os':'mac','plan':'free'}
22	{'os':'win','plan':'pro'}

That string is readable but awkward to parse downstream. Usually you want the nested fields as their own CSV columns. Pull them out in the SELECT and the conversion stays one command:

1clickhouse local -q "
2SELECT event_date, country, attrs['os'] AS os, attrs['plan'] AS plan
3FROM file('events.parquet')
4INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames"
1"event_date","country","os","plan"
2"2026-01-01","GB","mac","free"
3"2026-01-02","US","win","pro"
4"2026-01-03","DE","linux","free"

Now the map is two flat columns. Because the input is a full SQL table, you can also filter, rename, reorder, or aggregate during the conversion — not just dump the file verbatim.

Reverse direction? #

Going the other way, CSV back to Parquet, is the same pattern with the formats swapped. See convert CSV to Parquet for the column-typing and compression options worth setting when you write Parquet.

How fast is it? #

On a 3,000,000-row events_large.parquet, writing the full table out to a ~211 MB CSV completes in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"

~0.21 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The work is decoding the columnar Parquet and serializing every value to text; it streams, so memory stays flat whether the file is 200 MB or 200 GB. A concurrent load on the machine can nudge the number; the point is that the conversion is I/O-bound, not a bottleneck.

clickhouse-local runs the same SQL unchanged across dozens of formats and remote sources, and the same query scales from a file on your laptop to a ClickHouse server or ClickHouse Cloud when the data outgrows one machine, with no rewrite.

chDB: the same conversion from Python #

chDB is ClickHouse as an in-process Python library. The conversion is the identical SQL, no server and no subprocess:

1import chdb
2
3chdb.query("""
4SELECT * FROM file('data/events.parquet')
5INTO OUTFILE 'data/events_chdb.csv' TRUNCATE FORMAT CSVWithNames
6""")
7
8print(str(chdb.query("SELECT count() FROM file('data/events_chdb.csv')", "CSV")).rstrip())
120

If you read Parquet into pandas already, chDB writes the CSV without the DataFrame round-trip, and the same flatten and delimiter options apply.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-csv

Related: how to query a Parquet file, what is a Parquet file, and the reverse conversion CSV to Parquet.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...