To read a TSV file and query it with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then query the file directly:
clickhouse local -q "SELECT * FROM file('orders.tsv') LIMIT 10"
┌─order_date─┬─order_id─┬─country─┬─product───┬─revenue─┬─quantity─┐
1. │ 2026-01-01 │ 1 │ GB │ widget │ 5 │ 1 │
2. │ 2026-01-02 │ 2 │ US │ gadget │ 6.01 │ 2 │
3. │ 2026-01-03 │ 3 │ DE │ gizmo │ 7.02 │ 3 │
4. │ 2026-01-04 │ 4 │ FR │ doohickey │ 8.03 │ 4 │
5. │ 2026-01-05 │ 5 │ IN │ widget │ 9.04 │ 5 │
└────────────┴──────────┴─────────┴───────────┴─────────┴──────────┘
A .tsv extension is read as TSVWithNames: the first row gives the column names, and each column's type is inferred from the data. The file is read in place with no import step.
Prefer Python? See How to read a TSV file in Python and query it with SQL for the same queries against a pandas DataFrame.
A TSV carries no types of its own, so ClickHouse reads the header for names and samples the rows for types. DESCRIBE shows you what it picked, without any CREATE TABLE:
clickhouse local -q "DESCRIBE file('orders.tsv') FORMAT PrettyCompact"
┌─name───────┬─type──────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ order_date │ Nullable(Date) │ │ │ │ │ │
2. │ order_id │ Nullable(Int64) │ │ │ │ │ │
3. │ country │ Nullable(String) │ │ │ │ │ │
4. │ product │ Nullable(String) │ │ │ │ │ │
5. │ revenue │ Nullable(Float64) │ │ │ │ │ │
6. │ quantity │ Nullable(Int64) │ │ │ │ │ │
└────────────┴───────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
The header gave the names; the data gave the types. The trailing columns stay empty: they hold CREATE TABLE metadata (defaults, codecs, TTLs) that a flat text file doesn't carry. Columns come back Nullable because inference allows for empty values.
Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:
clickhouse local -q "
SELECT
country,
count() AS orders,
round(sum(revenue), 2) AS revenue,
round(avg(quantity), 2) AS avg_qty
FROM file('orders.tsv')
GROUP BY country
ORDER BY revenue DESC"
US 4 60.4 3.5
GB 4 56.36 2.5
AU 3 48.33 2
IN 3 45.3 2.67
FR 3 42.27 3.33
DE 3 39.24 4
This is the everyday workflow: open a terminal, run one query against the file, read the answer.
A TSV is structurally a CSV with a tab (\t) instead of a comma between fields. ClickHouse treats them as separate formats, TabSeparated and CSV, and picks one from the file extension. The tab delimiter is the reason TSV is popular for messy text: commas, quotes and free-form strings appear inside data all the time, but a literal tab almost never does, so a TSV usually needs no quoting at all. If your file uses a header row, the WithNames variant (TSVWithNames) reads it; if it doesn't, use the plain TSV format.
Many TSVs ship without a header row. Read those with the plain TSV format and name the columns yourself as the third argument to file():
clickhouse local -q "
SELECT * FROM file('orders_nohdr.tsv', 'TSV',
'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
ORDER BY revenue DESC LIMIT 3"
2026-01-20 20 US doohickey 24.19 5
2026-01-19 19 GB gizmo 23.18 4
2026-01-18 18 AU gadget 22.17 3
Without an explicit schema, a headerless TSV falls back to positional column names (c1, c2, ...) while still inferring each column's type from the data, so naming the columns up front keeps the query readable.
Type inference reads a sample of rows, not the whole file. For a flat text format with no embedded schema, two files with the same columns can infer differently depending on what the sample happens to contain. On the 3,000,000-row file in the example folder, revenue infers as String rather than Float64:
clickhouse local -q "DESCRIBE file('orders_large.tsv') FORMAT PrettyCompact"
┌─name───────┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ order_date │ Nullable(Date) │ │ │ │ │ │
2. │ order_id │ Nullable(Int64) │ │ │ │ │ │
3. │ country │ Nullable(String) │ │ │ │ │ │
4. │ product │ Nullable(String) │ │ │ │ │ │
5. │ revenue │ Nullable(String) │ │ │ │ │ │
6. │ quantity │ Nullable(Int64) │ │ │ │ │ │
└────────────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Summing a String column fails. The fix is the same explicit-schema argument: pass TSVWithNames and the types you want, and inference is skipped entirely. This also reads faster, because the parser knows the target type for every field up front.
clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue, round(avg(quantity), 3) AS avg_qty
FROM file('orders_large.tsv', 'TSVWithNames',
'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
GROUP BY country ORDER BY revenue DESC LIMIT 5"
IN 300860 75118523.46 3.003
CA 300273 75078200.28 3.001
FR 300039 75024792.42 3.001
US 300116 74988571.55 3.001
BR 299891 74986241.81 3.001
For data you query repeatedly, the durable answer is to stop re-parsing text: pin the schema once and write the result to a typed format. Convert the TSV to Parquet and every later query reads typed, columnar data with no inference at all.
You don't unzip anything first. clickhouse local detects the .gz extension and decompresses on the fly, so a .tsv.gz file is queried exactly like a .tsv:
clickhouse local -q "SELECT country, count() FROM file('orders.tsv.gz') GROUP BY country ORDER BY country"
AU 3
DE 3
FR 3
GB 4
IN 3
US 4
The same applies to .tsv.zst, .tsv.xz, and other supported codecs. The compression is inferred from the file name.
On the 3,000,000-row, ~110 MB orders_large.tsv, the GROUP BY country with sum and avg over every row (with the schema pinned, as above) completes in ~0.19 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the tab-separated text from scratch on every run; there is no cached table.
clickhouse local runs the same SQL unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample TSVs (including the ~110 MB file used for the timing above), run.sh with every command on this page, and expected_output.txt.
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-tsv
./generate.sh && ./run.sh