How to convert TSV to JSON

Al Brown
Last updated: Jun 15, 2026

To convert a TSV file to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the result to a JSON file with INTO OUTFILE:

1clickhouse local -q "SELECT * FROM file('events.tsv') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
1{"event_date":"2026-01-01","event_id":1,"country":"GB","action":"click","value":5,"quantity":1}
2{"event_date":"2026-01-02","event_id":2,"country":"US","action":"view","value":6.01,"quantity":2}
3{"event_date":"2026-01-03","event_id":3,"country":"DE","action":"purchase","value":7.02,"quantity":3}

The TSV header becomes the JSON keys and column types are inferred from the data, so value lands as 6.01 (a number) not "6.01" (a string). The file is read in place with no upload or import step, and because the read is streamed it handles files larger than RAM.

The header drives the keys, the data drives the types #

A TSV is just text: tab-separated values with a header row. clickhouse-local reads that header for the key names and infers each column's type from the values, so numbers come out as JSON numbers and strings as JSON strings. Check what was inferred with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.tsv')"
1event_date	Nullable(Date)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5value	Nullable(Float64)
6quantity	Nullable(Int64)

This is the information gain over a generic upload-a-file converter site: types are carried through, so value lands as 6.01 (a number) and not "6.01" (a string), and nothing leaves your machine. If inference guesses wrong (a leading-zero ID that should stay a string, say), pass an explicit schema as the third argument to file(), exactly as you would when querying the TSV directly.

JSONEachRow vs a single JSON array #

This is the one decision that trips people up. There are two shapes commonly called "JSON", and they are not interchangeable.

JSONEachRow writes one object per line. It streams, so a billion-row file converts in constant memory, and downstream tools read it line by line:

1clickhouse local -q "SELECT * FROM file('events.tsv') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"

A single JSON array wraps every row in one [ ... ] document. That is what JSON.parse() in a browser wants, but it forces the whole result into one value. Get it by keeping JSONEachRow and turning on output_format_json_array_of_rows:

1clickhouse local -q "SELECT * FROM file('events.tsv') INTO OUTFILE 'events.json' FORMAT JSONEachRow" \
2  --output_format_json_array_of_rows=1
1[
2{"event_date":"2026-01-01","event_id":1,"country":"GB","action":"click","value":5,"quantity":1},
3{"event_date":"2026-01-02","event_id":2,"country":"US","action":"view","value":6.01,"quantity":2},
4{"event_date":"2026-01-03","event_id":3,"country":"DE","action":"purchase","value":7.02,"quantity":3}
5]

Pick the array when something downstream must load the document whole; pick JSONEachRow for everything else, especially large files and streaming pipelines. (ClickHouse also has a plain JSON format, but it wraps the rows in a meta/data/statistics envelope meant for query results, not for a clean data file, so it is rarely what you want for a conversion.)

Transform while you convert #

Because the TSV is a table, the full SQL surface is available during the conversion. Project columns, filter rows, rename, derive — the JSON only contains what you select:

1clickhouse local -q "
2SELECT event_date, country, upper(action) AS action_upper, value
3FROM file('events.tsv')
4WHERE action = 'purchase'
5ORDER BY value DESC
6LIMIT 3
7INTO OUTFILE 'purchases.jsonl' FORMAT JSONEachRow"
1{"event_date":"2026-01-19","country":"GB","action_upper":"PURCHASE","value":23.18}
2{"event_date":"2026-01-15","country":"DE","action_upper":"PURCHASE","value":19.14}
3{"event_date":"2026-01-11","country":"IN","action_upper":"PURCHASE","value":15.1}

A flat TSV becomes nested JSON just as easily. Build an object with map() (or a tuple / named tuple) and it serializes as a nested JSON value:

1clickhouse local -q "
2SELECT event_id, map('country', country, 'action', action) AS attrs, value
3FROM file('events.tsv')
4LIMIT 3
5INTO OUTFILE 'nested.jsonl' FORMAT JSONEachRow"
1{"event_id":1,"attrs":{"country":"GB","action":"click"},"value":5}
2{"event_id":2,"attrs":{"country":"US","action":"view"},"value":6.01}
3{"event_id":3,"attrs":{"country":"DE","action":"purchase"},"value":7.02}

Compress on the way out #

Add a .gz (or .zst, .lz4, .xz) suffix to the output name and the file is compressed as it is written. No extra flag, and the same suffix is decompressed transparently on read:

1clickhouse local -q "SELECT * FROM file('events.tsv') INTO OUTFILE 'events.jsonl.gz' FORMAT JSONEachRow"
2clickhouse local -q "SELECT count() FROM file('events.jsonl.gz', 'JSONEachRow')"
120

How fast is it? #

On a 3,000,000-row, ~105 MB TSV (events_large.tsv), converting the whole file to JSONEachRow completes in:

1clickhouse local -q "SELECT * FROM file('events_large.tsv') INTO OUTFILE 'events_large.jsonl' FORMAT JSONEachRow"

~0.27 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That parses the TSV text and serializes ~312 MB of JSON from scratch each run, with no cached table. Numbers may shift slightly under concurrent load; the point is that the conversion is I/O-bound, not a bottleneck. An online converter would have you upload the file first.

chDB Python equivalent #

chDB is the same engine embedded in Python, so the conversion is the same SQL with the result written to a file:

1import chdb
2
3# JSONEachRow (NDJSON): one object per line.
4with open("events.jsonl", "w") as f:
5    f.write(str(chdb.query("SELECT * FROM file('events.tsv')", "JSONEachRow")))
6
7# A single JSON array instead.
8with open("events.json", "w") as f:
9    f.write(str(chdb.query(
10        "SELECT * FROM file('events.tsv') SETTINGS output_format_json_array_of_rows = 1",
11        "JSONEachRow",
12    )))
1{"event_date":"2026-01-01","event_id":1,"country":"GB","action":"click","value":5,"quantity":1}
2{"event_date":"2026-01-02","event_id":2,"country":"US","action":"view","value":6.01,"quantity":2}
3{"event_date":"2026-01-03","event_id":3,"country":"DE","action":"purchase","value":7.02,"quantity":3}

Reverse direction? See how to convert JSON to TSV.

clickhouse-local runs the same SELECT ... INTO OUTFILE ... FORMAT unchanged across dozens of formats, against remote files, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite. Once the data is JSON, you can run SQL on the JSONL output the same way.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample TSVs (including the ~105 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-tsv-to-json

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...