How to convert Parquet to TSV

Al Brown
Last updated: Jun 8, 2026

To convert Parquet to TSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then read the Parquet file and write it out as TSV:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.tsv' TRUNCATE FORMAT TSV"
12026-01-01	1	GB	click	5	[0,0,0]
22026-01-02	2	US	view	6.01	[1,1,1]
32026-01-03	3	DE	signup	7.02	[2,2,2]
42026-01-04	4	FR	purchase	8.03	[0,3,3]
52026-01-05	5	IN	click	9.04	[1,4,4]

The schema is read straight from the Parquet footer, so you declare no columns. The file is converted in place with no upload or import step, and because the read and write both stream, this works on files larger than RAM.

Parquet is self-describing: every column carries its type in the file. clickhouse-local reads that footer and uses it directly, so there is no schema guessing on the way in. Check what it sees with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.parquet')"
1event_date	Date32
2event_id	UInt64
3country	String
4action	String
5value	Float64
6tags	Array(UInt8)

Those types are written into the TSV as text. A Date32 becomes 2026-01-01, a Float64 becomes 6.01, and the Array(UInt8) column becomes [0,0,0] in a single tab-separated field. The tabs separate columns; values themselves never contain a raw tab (any tab inside a string is escaped as \t), so the column boundaries stay unambiguous.

Keep the column names with TSVWithNames #

Plain TSV writes data rows only. If you want a header row of column names, use TSVWithNames:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events_named.tsv' TRUNCATE FORMAT TSVWithNames"
1event_date	event_id	country	action	value	tags
22026-01-01	1	GB	click	5	[0,0,0]
32026-01-02	2	US	view	6.01	[1,1,1]

This is the version most tools expect, and it lets you read the file back without re-declaring the columns.

The nested-column gotcha #

This is where a Parquet to TSV conversion differs from a flat one. Parquet supports nested types: arrays, maps, and structs. TSV is a flat, one-value-per-cell text format with no nested grammar of its own. ClickHouse handles this by serialising the nested value into a single field using its own array literal syntax, which is why tags showed up as [0,0,0].

That representation is not lossy for arrays. Read the TSV back and the array parses straight into an Array again.

1clickhouse local -q "DESCRIBE file('events_named.tsv', 'TSVWithNames')"
2clickhouse local -q "SELECT tags, length(tags) AS n FROM file('events_named.tsv', 'TSVWithNames') LIMIT 3"
1event_date	Nullable(Date)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5value	Nullable(Float64)
6tags	Array(Nullable(Int64))
7
8[0,0,0]	3
9[1,1,1]	3
10[2,2,2]	3

The tags column came back as Array(Nullable(Int64)) and length() still reads 3. Two things shifted on the way back in, both expected from text: scalar columns are now Nullable (TSV inference allows empty cells), and the integer width widened from UInt8 to Int64 because TSV carries no width hint. The values are identical.

If the consumer of your TSV cannot parse a [...] field, flatten the nesting in SQL before you export. arrayJoin turns one row with an array into one row per element, giving a fully flat TSV:

1clickhouse local -q "SELECT event_id, country, arrayJoin(tags) AS tag FROM file('events.parquet') ORDER BY event_id LIMIT 5 FORMAT TSV"
11	GB	0
21	GB	0
31	GB	0
42	US	1
52	US	1

Now every value sits in its own cell with no brackets in sight. This is the part an upload-required online converter can't do: you decide the exact shape of the output before it lands.

Options worth knowing #

  • Filter and reshape on the way out. The conversion is a normal SELECT, so add a WHERE, pick columns, rename them, or compute new ones. You convert and trim in one pass instead of exporting everything and cleaning up later.
  • NULL representation. A SQL NULL is written as \N in TSV by default. Set format_tsv_null_representation if a downstream tool expects an empty cell or a literal string instead.
  • Date and number formats. Dates serialise as YYYY-MM-DD and floats use a . decimal point. If you need a different layout, cast in the SELECT (for example toString(formatDateTime(event_date, '%d/%m/%Y'))).
  • Compress in the same command. Give the output an extension like events.tsv.gz and ClickHouse compresses on the fly. The codec is inferred from the file name.
  • Variants. TSVWithNamesAndTypes adds a second header row carrying the ClickHouse type of each column, which makes a later read fully self-describing.

How fast is it? #

On a 3,000,000-row Parquet file (events_large.parquet, ~26 MB on disk), the full conversion to a ~123 MB TSV runs in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.tsv' TRUNCATE FORMAT TSV"

~0.19 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number is best-of-three and may wobble slightly under concurrent load. The work is bounded by reading the columnar Parquet and writing text, both streamed, so it scales linearly and never needs the whole file in memory.

Reverse direction #

To go the other way, see convert TSV to Parquet: the same file() to INTO OUTFILE pattern with the formats swapped.

Prefer Python? #

The same conversion runs in-process with chDB, the embedded ClickHouse engine for Python. Same SQL, written to a file from a Python script:

1import chdb
2
3# Convert Parquet -> TSV (with a header row)
4chdb.query(
5    "SELECT * FROM file('events.parquet') "
6    "INTO OUTFILE 'events_chdb.tsv' TRUNCATE FORMAT TSVWithNames"
7)
8
9# Read it back into a pandas DataFrame to confirm the round-trip
10df = chdb.query(
11    "SELECT event_id, country, tags FROM file('events_chdb.tsv', 'TSVWithNames') "
12    "ORDER BY event_id LIMIT 3",
13    "DataFrame",
14)
15print(df)
1   event_id country       tags
20         1      GB  [0, 0, 0]
31         2      US  [1, 1, 1]
42         3      DE  [2, 2, 2]

The tags array comes back as a real Python list inside the DataFrame. chDB is a fair alternative to reading Parquet with pandas or pyarrow and then writing CSV; you stay in one engine and one query.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet (including the 3M-row file used for the timing above), run.sh with every command on this page, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-tsv

The same SQL scales unchanged from this file to a ClickHouse server to ClickHouse Cloud when the data outgrows your laptop. Related: convert Parquet to CSV, query a TSV file with SQL, what is a TSV file, and how to query a Parquet file.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...