To convert a TSV file to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then write the result INTO OUTFILE to produce the Parquet:
clickhouse local -q "SELECT * FROM file('events.tsv') INTO OUTFILE 'events.parquet' FORMAT Parquet"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE purchase 7.02 3
2026-01-04 4 FR signup 8.03 4
2026-01-05 5 IN click 9.04 5
clickhouse local reads the TSV header for column names and infers each column's type from the data, so the file is converted in place with no import step. Those types flow directly into the Parquet schema, giving readers real dates, integers, and floats rather than strings to re-parse.
A TSV is just text: every value is a string until something infers a type. Parquet is columnar and typed, so the conversion has to assign a type to each column. clickhouse-local does that for you from the data. Check what it inferred with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
sessions Nullable(Int64)
The header gave the names; the data gave the types. Those types are written into the Parquet schema, so a reader gets real dates, integers, and floats back, not strings it has to re-parse. Read the file back to confirm the round-trip:
clickhouse local -q "SELECT * FROM file('events.parquet') ORDER BY event_id LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE purchase 7.02 3
2026-01-04 4 FR signup 8.03 4
2026-01-05 5 IN click 9.04 5
Inference is convenient, but sometimes you want a specific type: an ID that should stay a fixed-width UInt32, a code that must remain a string rather than become a number. Pass the format and an explicit schema as the second and third arguments to file(), and those types flow into the Parquet:
clickhouse local -q "
SELECT * FROM file('events.tsv', 'TSVWithNames',
'event_date Date, event_id UInt32, country String, action String, amount Float64, sessions UInt8')
INTO OUTFILE 'events.typed.parquet' TRUNCATE FORMAT Parquet"
event_date Date32
event_id UInt32
country String
action String
amount Float64
sessions UInt8
The columns are no longer Nullable, and event_id is a UInt32 and sessions a UInt8 rather than the inferred 64-bit integers. Smaller, exact types make the Parquet smaller and faster to scan.
Online TSV-to-Parquet converters make you upload the file and accept whatever defaults they bake in. Doing it locally means you control the output, and your data never leaves the machine. Two settings matter most.
The codec matters most. ClickHouse writes Parquet with zstd by default, which is a good balance of size and speed. You can switch it per file. Here are the three common codecs run against the 3,000,000-row sample (events_large.tsv, ~111 MB of text):
clickhouse local -q "
SELECT * FROM file('events_large.tsv')
INTO OUTFILE 'events_large.zstd.parquet' TRUNCATE FORMAT Parquet
SETTINGS output_format_parquet_compression_method = 'zstd'"
none 45304450 bytes
lz4 32697093 bytes
zstd 22939660 bytes
(source events_large.tsv: 110671413 bytes)
zstd turned 111 MB of TSV text into a 23 MB Parquet file: about 4.8x smaller than the source and roughly half the size of uncompressed Parquet. Use lz4 if you want faster reads at a slightly larger size; use none only if you will recompress downstream.
There is also output_format_parquet_row_group_size, which controls how many rows go in each row group and sets the granularity readers can skip at. The default suits most files; raise it for very wide scans, lower it for selective point lookups.
You can confirm what landed in the file by reading the Parquet footer directly with the ParquetMetadata format:
clickhouse local -q "
SELECT num_rows, num_row_groups,
arrayDistinct(arrayMap(c -> c.compression, columns)) AS codecs
FROM file('events_large.zstd.parquet', ParquetMetadata)
FORMAT Vertical"
num_rows: 3000000
num_row_groups: 3
codecs: ['ZSTD']
One note on small inputs: in the first example the 668-byte TSV produced a 2,875-byte Parquet. Parquet carries schema, statistics, and footer metadata, so it adds fixed overhead that dwarfs a tiny payload. Parquet pays off at scale, which is exactly where the 111 MB file lands.
Converting the 3,000,000-row, ~111 MB events_large.tsv to Parquet (parsing every line, inferring types, writing the columnar file) takes ~0.30 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM).
clickhouse local -q "SELECT * FROM file('events_large.tsv') INTO OUTFILE 'events_large.parquet' TRUNCATE FORMAT Parquet"
That includes reading and parsing the TSV from scratch every run; there is no cached table.
If you would rather stay in Python, chDB is the same ClickHouse engine in-process. The SQL is identical: the same SELECT ... INTO OUTFILE ... FORMAT Parquet.
import chdb
chdb.query("""
SELECT * FROM file('events.tsv')
INTO OUTFILE 'events_chdb.parquet' TRUNCATE
FORMAT Parquet
""")
# Confirm the inferred schema landed in the Parquet file
print(chdb.query("DESCRIBE file('events_chdb.parquet')", "TSV"))
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
sessions Nullable(Int64)
No server, no upload, and you can wire the conversion straight into a pandas or pyarrow pipeline.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample TSVs (including the ~111 MB file used for the timing and codec comparison), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-tsv-to-parquet
Reverse direction? See how to convert Parquet to TSV.
The same SQL that converts a file here runs unchanged against a directory of files, a remote object store, and a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop.