How to convert CSV to TSV

To convert a CSV file to TSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then read the CSV and write it out as TSVWithNames:

1clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders.tsv' TRUNCATE FORMAT TSVWithNames"

1order_date	order_id	country	product	revenue	quantity
22026-01-01	1	GB	widget	5	1
32026-01-02	2	US	gadget	6.01	2
42026-01-03	3	DE	gizmo	7.02	3
52026-01-04	4	FR	doohickey	8.03	4

clickhouse-local reads the CSV header for column names, infers each column's type from the data, and streams the rows straight into a tab-separated file. The source CSV is read in place with no import step; the same command handles files larger than RAM.

Keep the header, or drop it

TSVWithNames writes the column names as the first line. If a downstream tool wants data rows only, switch the output format to TSV and the header is gone:

1clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders_noheader.tsv' TRUNCATE FORMAT TSV"

12026-01-01	1	GB	widget	5	1
22026-01-02	2	US	gadget	6.01	2
32026-01-03	3	DE	gizmo	7.02	3

The TRUNCATE keyword overwrites the output file if it already exists, so the command is safe to re-run. Drop it and clickhouse-local will refuse to clobber an existing file.

The types inferred from the CSV carry straight into the TSV. Read the result back and the schema is intact: column names from the kept header, types from the data.

1clickhouse local -q "DESCRIBE file('orders.tsv', 'TSVWithNames')"

1order_date	Nullable(Date)
2order_id	Nullable(Int64)
3country	Nullable(String)
4product	Nullable(String)
5revenue	Nullable(Float64)
6quantity	Nullable(Int64)

The delimiter gotcha: commas and tabs

This is where converting on your own machine beats an online converter. CSV and TSV disagree about what is special. A comma is an ordinary character in TSV, so a CSV value like red, large needs no quoting once it lands in a tab-separated file. But a tab inside a value is the TSV field separator, so clickhouse-local escapes it to the two-character sequence \t to keep the row intact.

Take a CSV whose values contain both a comma and a literal tab:

1clickhouse local -q "SELECT * FROM file('notes.csv') INTO OUTFILE 'notes.tsv' TRUNCATE FORMAT TSVWithNames"
2od -c notes.tsv

10000000    i   d  \t   l   a   b   e   l  \t   n   o   t   e  \n   1  \t
20000020    r   e   d   ,       l   a   r   g   e  \t   l   i   n   e   1
30000040    \   t   l   i   n   e   2  \n   2  \t   b   l   u   e  \t   p
40000060    l   a   i   n  \n

The red, large value keeps its comma verbatim. The embedded tab in line1<tab>line2 is written as the literal characters \ t. Read the TSV back and the original tab is restored:

1clickhouse local -q "SELECT * FROM file('notes.tsv') FORMAT Vertical"

1Row 1:
2──────
3id:    1
4label: red, large
5note:  line1	line2

The round-trip is lossless. A naive find-and-replace of commas with tabs would have corrupted both values; clickhouse-local handles the escaping for you in both directions.

chDB: the same conversion in Python

If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical: SELECT from the CSV, write INTO OUTFILE as TSVWithNames.

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('orders.csv') "
5    "INTO OUTFILE 'orders.tsv' TRUNCATE FORMAT TSVWithNames"
6)

No pandas round-trip, no to_csv(sep='\t') quoting surprises. The same engine that wrote the file from the CLI writes it here.

How fast is it?

Converting a ~3,000,000-row, ~126 MB CSV (orders_large.csv) to TSV:

1clickhouse local -q "SELECT * FROM file('orders_large.csv') INTO OUTFILE 'orders_large.tsv' TRUNCATE FORMAT TSVWithNames"

~0.32 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That includes parsing the CSV text and re-serialising every row as TSV. Because the rows stream through, the source file never has to fit in memory; the same command converts a file far larger than RAM.

1run 1: real 0.32
2run 2: real 0.32
3run 3: real 0.32
4rows written: 3000000

Reverse direction?

Going the other way is the mirror image: SELECT from the TSV, write FORMAT CSVWithNames. See convert TSV to CSV. Once the data is a TSV you can also query it in place; see how to query a TSV file.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~126 MB file used for the timing above), run.sh and run.py with every command on this page, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-tsv

The same SQL that converts a file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it — no rewrite. If you want to run queries against the CSV before converting it, start with how to run SQL on a CSV file.

Keep the header, or drop it

The delimiter gotcha: commas and tabs

chDB: the same conversion in Python

How fast is it?

Reverse direction?

Run it yourself

Subscribe to our newsletter

More like this

How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid

How to query a REST API in Python

How to convert Parquet to ORC

How to query a REST API with SQL

How to convert CSV to TSV

Keep the header, or drop it #

The delimiter gotcha: commas and tabs #

chDB: the same conversion in Python #

How fast is it? #

Reverse direction? #

Run it yourself #

Subscribe to our newsletter

More like this

How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid

How to query a REST API in Python

How to convert Parquet to ORC

How to query a REST API with SQL

Keep the header, or drop it

The delimiter gotcha: commas and tabs

chDB: the same conversion in Python

How fast is it?

Reverse direction?

Run it yourself