How to read a TSV file

To read a TSV file and query it with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then query the file directly:

1clickhouse local -q "SELECT * FROM file('orders.tsv') LIMIT 10"

1┌─order_date─┬─order_id─┬─country─┬─product───┬─revenue─┬─quantity─┐
21. │ 2026-01-01 │        1 │ GB      │ widget    │       5 │        1 │
32. │ 2026-01-02 │        2 │ US      │ gadget    │    6.01 │        2 │
43. │ 2026-01-03 │        3 │ DE      │ gizmo     │    7.02 │        3 │
54. │ 2026-01-04 │        4 │ FR      │ doohickey │    8.03 │        4 │
65. │ 2026-01-05 │        5 │ IN      │ widget    │    9.04 │        5 │
7   └────────────┴──────────┴─────────┴───────────┴─────────┴──────────┘

A .tsv extension is read as TSVWithNames: the first row gives the column names, and each column's type is inferred from the data. The file is read in place with no import step.

Prefer Python? See How to read a TSV file in Python and query it with SQL for the same queries against a pandas DataFrame.

See the schema without declaring one

A TSV carries no types of its own, so ClickHouse reads the header for names and samples the rows for types. DESCRIBE shows you what it picked, without any CREATE TABLE:

1clickhouse local -q "DESCRIBE file('orders.tsv') FORMAT PrettyCompact"

1┌─name───────┬─type──────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ order_date │ Nullable(Date)    │              │                    │         │                  │                │
32. │ order_id   │ Nullable(Int64)   │              │                    │         │                  │                │
43. │ country    │ Nullable(String)  │              │                    │         │                  │                │
54. │ product    │ Nullable(String)  │              │                    │         │                  │                │
65. │ revenue    │ Nullable(Float64) │              │                    │         │                  │                │
76. │ quantity   │ Nullable(Int64)   │              │                    │         │                  │                │
8   └────────────┴───────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

The header gave the names; the data gave the types. The trailing columns stay empty: they hold CREATE TABLE metadata (defaults, codecs, TTLs) that a flat text file doesn't carry. Columns come back Nullable because inference allows for empty values.

Aggregate directly on the TSV

Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:

1clickhouse local -q "
2SELECT
3  country,
4  count()                 AS orders,
5  round(sum(revenue), 2)  AS revenue,
6  round(avg(quantity), 2) AS avg_qty
7FROM file('orders.tsv')
8GROUP BY country
9ORDER BY revenue DESC"

1US	4	60.4	3.5
2GB	4	56.36	2.5
3AU	3	48.33	2
4IN	3	45.3	2.67
5FR	3	42.27	3.33
6DE	3	39.24	4

This is the everyday workflow: open a terminal, run one query against the file, read the answer.

TSV vs CSV: the delimiter is the only difference

A TSV is structurally a CSV with a tab (\t) instead of a comma between fields. ClickHouse treats them as separate formats, TabSeparated and CSV, and picks one from the file extension. The tab delimiter is the reason TSV is popular for messy text: commas, quotes and free-form strings appear inside data all the time, but a literal tab almost never does, so a TSV usually needs no quoting at all. If your file uses a header row, the WithNames variant (TSVWithNames) reads it; if it doesn't, use the plain TSV format.

A TSV with no header

Many TSVs ship without a header row. Read those with the plain TSV format and name the columns yourself as the third argument to file():

1clickhouse local -q "
2SELECT * FROM file('orders_nohdr.tsv', 'TSV',
3  'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
4ORDER BY revenue DESC LIMIT 3"

12026-01-20	20	US	doohickey	24.19	5
22026-01-19	19	GB	gizmo	23.18	4
32026-01-18	18	AU	gadget	22.17	3

Without an explicit schema, a headerless TSV falls back to positional column names (c1, c2, ...) while still inferring each column's type from the data, so naming the columns up front keeps the query readable.

Pin the types when inference samples wrong

Type inference reads a sample of rows, not the whole file. For a flat text format with no embedded schema, two files with the same columns can infer differently depending on what the sample happens to contain. On the 3,000,000-row file in the example folder, revenue infers as String rather than Float64:

1clickhouse local -q "DESCRIBE file('orders_large.tsv') FORMAT PrettyCompact"

1┌─name───────┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ order_date │ Nullable(Date)   │              │                    │         │                  │                │
32. │ order_id   │ Nullable(Int64)  │              │                    │         │                  │                │
43. │ country    │ Nullable(String) │              │                    │         │                  │                │
54. │ product    │ Nullable(String) │              │                    │         │                  │                │
65. │ revenue    │ Nullable(String) │              │                    │         │                  │                │
76. │ quantity   │ Nullable(Int64)  │              │                    │         │                  │                │
8   └────────────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Summing a String column fails. The fix is the same explicit-schema argument: pass TSVWithNames and the types you want, and inference is skipped entirely. This also reads faster, because the parser knows the target type for every field up front.

1clickhouse local -q "
2SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue, round(avg(quantity), 3) AS avg_qty
3FROM file('orders_large.tsv', 'TSVWithNames',
4  'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
5GROUP BY country ORDER BY revenue DESC LIMIT 5"

1IN	300860	75118523.46	3.003
2CA	300273	75078200.28	3.001
3FR	300039	75024792.42	3.001
4US	300116	74988571.55	3.001
5BR	299891	74986241.81	3.001

For data you query repeatedly, the durable answer is to stop re-parsing text: pin the schema once and write the result to a typed format. Convert the TSV to Parquet and every later query reads typed, columnar data with no inference at all.

Gzipped TSVs work too

You don't unzip anything first. clickhouse local detects the .gz extension and decompresses on the fly, so a .tsv.gz file is queried exactly like a .tsv:

1clickhouse local -q "SELECT country, count() FROM file('orders.tsv.gz') GROUP BY country ORDER BY country"

1AU	3
2DE	3
3FR	3
4GB	4
5IN	3
6US	4

The same applies to .tsv.zst, .tsv.xz, and other supported codecs. The compression is inferred from the file name.

How fast is it?

On the 3,000,000-row, ~110 MB orders_large.tsv, the GROUP BY country with sum and avg over every row (with the schema pinned, as above) completes in ~0.19 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the tab-separated text from scratch on every run; there is no cached table.

clickhouse local runs the same SQL unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample TSVs (including the ~110 MB file used for the timing above), run.sh with every command on this page, and expected_output.txt.

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-tsv
3./generate.sh && ./run.sh

github.com/ClickHouse/examples → local-analytics/clickhouse-local-tsv

Prefer Python? → Read a TSV file in Python with chDB

See the schema without declaring one

Aggregate directly on the TSV

TSV vs CSV: the delimiter is the only difference

A TSV with no header

Pin the types when inference samples wrong

Gzipped TSVs work too

How fast is it?

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to read a TSV file

See the schema without declaring one #

Aggregate directly on the TSV #

TSV vs CSV: the delimiter is the only difference #

A TSV with no header #

Pin the types when inference samples wrong #

Gzipped TSVs work too #

How fast is it? #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

See the schema without declaring one

Aggregate directly on the TSV

TSV vs CSV: the delimiter is the only difference

A TSV with no header

Pin the types when inference samples wrong

Gzipped TSVs work too

How fast is it?

Run it yourself

Related