How to convert CSV to TSV

Al Brown
Last updated: Jun 8, 2026

To convert a CSV file to TSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

curl https://clickhouse.com/cli | sh   # install clickhousectl
clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then read the CSV and write it out as TSVWithNames:

clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders.tsv' TRUNCATE FORMAT TSVWithNames"
order_date	order_id	country	product	revenue	quantity
2026-01-01	1	GB	widget	5	1
2026-01-02	2	US	gadget	6.01	2
2026-01-03	3	DE	gizmo	7.02	3
2026-01-04	4	FR	doohickey	8.03	4

clickhouse-local reads the CSV header for column names, infers each column's type from the data, and streams the rows straight into a tab-separated file. The source CSV is read in place with no import step; the same command handles files larger than RAM.

Keep the header, or drop it

TSVWithNames writes the column names as the first line. If a downstream tool wants data rows only, switch the output format to TSV and the header is gone:

clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders_noheader.tsv' TRUNCATE FORMAT TSV"
2026-01-01	1	GB	widget	5	1
2026-01-02	2	US	gadget	6.01	2
2026-01-03	3	DE	gizmo	7.02	3

The TRUNCATE keyword overwrites the output file if it already exists, so the command is safe to re-run. Drop it and clickhouse-local will refuse to clobber an existing file.

The types inferred from the CSV carry straight into the TSV. Read the result back and the schema is intact: column names from the kept header, types from the data.

clickhouse local -q "DESCRIBE file('orders.tsv', 'TSVWithNames')"
order_date	Nullable(Date)
order_id	Nullable(Int64)
country	Nullable(String)
product	Nullable(String)
revenue	Nullable(Float64)
quantity	Nullable(Int64)

The delimiter gotcha: commas and tabs

This is where converting on your own machine beats an online converter. CSV and TSV disagree about what is special. A comma is an ordinary character in TSV, so a CSV value like red, large needs no quoting once it lands in a tab-separated file. But a tab inside a value is the TSV field separator, so clickhouse-local escapes it to the two-character sequence \t to keep the row intact.

Take a CSV whose values contain both a comma and a literal tab:

clickhouse local -q "SELECT * FROM file('notes.csv') INTO OUTFILE 'notes.tsv' TRUNCATE FORMAT TSVWithNames"
od -c notes.tsv
0000000    i   d  \t   l   a   b   e   l  \t   n   o   t   e  \n   1  \t
0000020    r   e   d   ,       l   a   r   g   e  \t   l   i   n   e   1
0000040    \   t   l   i   n   e   2  \n   2  \t   b   l   u   e  \t   p
0000060    l   a   i   n  \n

The red, large value keeps its comma verbatim. The embedded tab in line1<tab>line2 is written as the literal characters \ t. Read the TSV back and the original tab is restored:

clickhouse local -q "SELECT * FROM file('notes.tsv') FORMAT Vertical"
Row 1:
──────
id:    1
label: red, large
note:  line1	line2

The round-trip is lossless. A naive find-and-replace of commas with tabs would have corrupted both values; clickhouse-local handles the escaping for you in both directions.

chDB: the same conversion in Python

If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical: SELECT from the CSV, write INTO OUTFILE as TSVWithNames.

import chdb

chdb.query(
    "SELECT * FROM file('orders.csv') "
    "INTO OUTFILE 'orders.tsv' TRUNCATE FORMAT TSVWithNames"
)

No pandas round-trip, no to_csv(sep='\t') quoting surprises. The same engine that wrote the file from the CLI writes it here.

How fast is it?

Converting a ~3,000,000-row, ~126 MB CSV (orders_large.csv) to TSV:

clickhouse local -q "SELECT * FROM file('orders_large.csv') INTO OUTFILE 'orders_large.tsv' TRUNCATE FORMAT TSVWithNames"

~0.32 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That includes parsing the CSV text and re-serialising every row as TSV. Because the rows stream through, the source file never has to fit in memory; the same command converts a file far larger than RAM.

run 1: real 0.32
run 2: real 0.32
run 3: real 0.32
rows written: 3000000

Reverse direction?

Going the other way is the mirror image: SELECT from the TSV, write FORMAT CSVWithNames. See convert TSV to CSV. Once the data is a TSV you can also query it in place; see how to query a TSV file.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~126 MB file used for the timing above), run.sh and run.py with every command on this page, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-tsv

The same SQL that converts a file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it — no rewrite. If you want to run queries against the CSV before converting it, start with how to run SQL on a CSV file.


Share this resource

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

More like this

Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

Continue reading ->

How to query a REST API in Python

Al Brown • Last updated: Jun 15, 2026

Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.

Continue reading ->

How to convert Parquet to ORC

Al Brown • Last updated: Jun 6, 2026

Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.

Continue reading ->