What is a TSV file?

Al Brown
Last updated: Jun 8, 2026

A TSV file (tab-separated values) is a plain-text file that stores a table, one row per line, with columns separated by a single tab character (\t). It optionally starts with a header row of column names. It is the tab-delimited cousin of CSV, and you can read one in place with one command:

1clickhouse local -q "SELECT * FROM file('data.tsv', TSVWithNames)"

At a glance #

PropertyTSV
LayoutRow-oriented plain text, one record per line
DelimiterA single tab character (\t)
HeaderOptional; first line holds column names
QuotingNone by default; special characters are backslash-escaped
Schema / typesNot stored; inferred or declared by the reader
CompressionNone built in (the file can be gzipped, etc.)
Common extensions.tsv, .tab, sometimes .txt

Internal structure #

A TSV has no container and no footer. It is exactly what you see in a text editor: lines of text, each line a row, each row split into fields by tab characters. The (optional) first line names the columns. There is no embedded schema, no type information and no statistics. A reader has to either infer the column types from the values or be told them.

Here is the start of a real TSV, with the tab characters rendered as -> so they are visible:

1id->country->event_type->revenue
20->GB->view->282.08
31->AU->click->292.07
42->IN->purchase->206.03
53->US->view->261.97

The first line is the header. Every following line is four fields joined by tabs. That is the entire format.

TSV vs CSV #

TSV and CSV are the same idea with a different separator: TSV splits on a tab, CSV splits on a comma. The choice of separator drives the one real difference between them, which is quoting.

Commas are common inside real data: addresses, product names, free text. So CSV needs an escaping convention. A field containing a comma (or a quote, or a newline) gets wrapped in double quotes, and any quote inside it is doubled. Tabs are rare inside data, so plain TSV usually doesn't need quoting at all; instead it backslash-escapes the handful of characters that would break parsing (a literal tab, newline or backslash become \t, \n, \\).

That difference is easy to see. Take the same rows as a CSV, but add a label column that holds both a comma and a quote:

1"id","country","label"
20,"GB","VIP, ""gold"" tier #0"
31,"AU","VIP, ""gold"" tier #1"
42,"IN","VIP, ""gold"" tier #2"

The whole label field is wrapped in quotes because it contains a comma, and the inner "gold" becomes ""gold"". In a TSV, none of that machinery fires for a comma, because the comma isn't the delimiter. This is why TSV is popular for data that is full of commas (and why a stray un-escaped comma is a classic CSV parsing bug). The trade-off is the mirror image: TSV breaks if your data contains literal tabs unless they're escaped.

Read one yourself #

You don't need a database server or an import step to work with a TSV. clickhouse local reads one in place from the command line. It's part of ClickHouse; install it with clickhousectl, the ClickHouse CLI for local and cloud:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Because a TSV carries no schema, the reader has to work out the column types. TSVWithNames tells ClickHouse the first line is a header; from there it infers types from the values. DESCRIBE shows what it inferred:

1clickhouse local -q "DESCRIBE file('events.tsv', TSVWithNames)"
1   ┌─name───────┬─type──────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ id         │ Nullable(Int64)   │              │                    │         │                  │                │
32. │ country    │ Nullable(String)  │              │                    │         │                  │                │
43. │ event_type │ Nullable(String)  │              │                    │         │                  │                │
54. │ revenue    │ Nullable(Float64) │              │                    │         │                  │                │
6   └────────────┴───────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

id came back as an integer, revenue as a float, the rest as strings, all inferred from the text. (If you don't have a header, use the TSV format and pass the structure yourself, or let ClickHouse name the columns c1, c2, …) Now query the file as if it were a table:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(revenue), 2) AS revenue
3FROM file('events.tsv', TSVWithNames)
4GROUP BY country
5ORDER BY revenue DESC"
1   ┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
2   ┃ country ┃ events ┃ revenue ┃
3   ┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
41. │ GB      │      3 │  883.53 │
5   ├─────────┼────────┼─────────┤
62. │ AU      │      3 │  812.51 │
7   ├─────────┼────────┼─────────┤
83. │ US      │      2 │  761.22 │
9   ├─────────┼────────┼─────────┤
104. │ IN      │      2 │  618.46 │
11   ├─────────┼────────┼─────────┤
125. │ DE      │      2 │  245.98 │
13   └─────────┴────────┴─────────┘

And to confirm the quoting behaviour, read the contrasting CSV. ClickHouse unwraps the quotes and collapses ""gold"" back to "gold":

1clickhouse local -q "SELECT label FROM file('events.csv', CSVWithNames) WHERE id = 7"
1VIP, "gold" tier #7

The comma and the quote survive the round-trip because CSV's quoting rules carried them. A TSV stores the same value with no quoting because its tab delimiter doesn't collide with the content.

Because TSV is plain text with no statistics or column layout, an engine reading one scans and parses the whole file. That's fine at laptop scale: on a 3,000,000-row, 72 MB TSV, this GROUP BY over all rows runs in 0.059s (best of three, warm cache, on an Apple M4 Pro, 14 cores, 24 GB RAM, with other agents running). For repeated analytical queries over larger data, a columnar format like Parquet is the better store; Parquet vs CSV covers why.

How it compares #

pandas loads a TSV with read_csv(sep='\t'), which is a reasonable single-machine option. The advantage of ClickHouse SQL is that the same query runs unchanged against a file on your laptop, a ClickHouse server, or ClickHouse Cloud.

To go from reading a TSV to filtering, casting and joining one, see how to query a TSV file. Working in Python? See read a TSV file in Python.

Run it yourself #

The data generator and every command above live in local-analytics/what-is-a-tsv-file in the ClickHouse examples repo. Run ./generate.sh then ./run.sh.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...