How to convert CSV to JSONL

To convert a CSV file to JSONL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the CSV to JSONL in one command:

1clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"

1{"event_date":"2026-01-01","event_id":1,"country":"GB","action":"click","amount":5,"quantity":1}
2{"event_date":"2026-01-02","event_id":2,"country":"US","action":"view","amount":6.01,"quantity":2}
3{"event_date":"2026-01-03","event_id":3,"country":"DE","action":"purchase","amount":7.02,"quantity":3}

ClickHouse reads the CSV header for column names, infers a real type per column, and streams each row out as a JSON object. The file is read in place with no import step, and because conversion streams, memory use stays flat regardless of file size.

What JSONL is, and why JSONEachRow

JSONL (also written NDJSON or .ndjson) is one JSON object per line. No outer array, no commas between records, so a reader can process it line by line and a writer can append to it. That line-per-record shape is exactly what ClickHouse's JSONEachRow format produces, which is why the conversion is a single command.

Types are inferred from the CSV and carried into the JSON

A CSV is all text. The win here is that clickhouse-local infers a real type per column and emits the matching JSON type, so numbers come out as JSON numbers and only strings get quoted. Check what was inferred with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.csv')"

1event_date	Nullable(Date)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5amount	Nullable(Float64)
6quantity	Nullable(Int64)

In the output above, event_id, amount, and quantity are bare JSON numbers; country and action are quoted strings. An upload-required converter that treats every CSV cell as a string would quote all six. Getting the types right at conversion time means the JSONL is correct for whatever reads it next, with no post-processing.

The default format for file('events.csv') is CSVWithNames: first row is column names, types come from the values. If inference guesses wrong (a ZIP code or an ID that should stay a string), pass an explicit schema as the second and third arguments to file(), exactly as in how to run SQL on a CSV file.

Read the JSONL straight back

JSONEachRow is self-describing, so reading the result needs no schema. The file is a table:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount),2) AS amount
3FROM file('events.jsonl')
4GROUP BY country ORDER BY amount DESC"

1US	4	60.4
2GB	4	56.36
3AU	3	48.33
4IN	3	45.3
5FR	3	42.27
6DE	3	39.24

For more on querying the result, see how to run SQL on a JSONL file.

Options worth knowing

This is where a one-line command beats a web converter: you control the output.

Keep 64-bit integers JS-safe. A value above 2^53 loses precision when a JavaScript JSON.parse reads it as a number. If the JSONL will be consumed by JavaScript, quote the 64-bit integers so they arrive as exact strings:

1clickhouse local -q "
2SELECT * FROM file('events.csv') LIMIT 2
3FORMAT JSONEachRow
4SETTINGS output_format_json_quote_64bit_integers=1"

1{"event_date":"2026-01-01","event_id":"1","country":"GB","action":"click","amount":5,"quantity":"1"}
2{"event_date":"2026-01-02","event_id":"2","country":"US","action":"view","amount":6.01,"quantity":"2"}

event_id and quantity are now quoted; amount (a float) is untouched.

Convert and compress in one step. Name the output .jsonl.gz and the file is gzipped as it is written. Reading it back is transparent too — no decompress step:

1clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.jsonl.gz' FORMAT JSONEachRow"
2clickhouse local -q "SELECT count() FROM file('events.jsonl.gz')"

.zst, .lz4, and .xz work the same way; the codec is taken from the file name.

Select or reshape on the way out. Because the conversion is a SELECT, you can project columns, filter rows, rename fields, or compute new ones in the same command rather than fixing the JSONL afterwards.

Reverse direction

To go back, swap the formats: read the JSONL and write CSVWithNames. See how to convert JSONL to CSV for the round-trip, including how a column that holds nested JSON flattens when it lands in a flat CSV.

1clickhouse local -q "SELECT * FROM file('events.jsonl') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

How fast is it?

On a 3,000,000-row, ~120 MB CSV (events_large.csv), converting the whole file to JSONL — parsing every CSV cell and re-encoding it as JSON — completes in ~0.34 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The output is a 309 MB .jsonl file. Conversion is streamed, so memory use stays flat regardless of file size.

In Python with chDB

If you work in Python, chDB is the same engine in-process — same SQL, same JSONEachRow format, same one-command conversion:

1import chdb
2
3# JSONEachRow = one JSON object per line (JSONL)
4chdb.query("""
5    SELECT * FROM file('data/events.csv')
6    INTO OUTFILE 'data/events.jsonl' TRUNCATE
7    FORMAT JSONEachRow
8""")

That writes events.jsonl directly, no pandas round-trip required. See also how to read a CSV file in Python with chDB.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~120 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-jsonl

The same SQL that converts one file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it. If you query the data more than once, consider converting to Parquet instead, which is columnar, typed, and far faster to scan.

What JSONL is, and why JSONEachRow

Types are inferred from the CSV and carried into the JSON

Read the JSONL straight back

Options worth knowing

Reverse direction

How fast is it?

In Python with chDB

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to convert CSV to JSONL

What JSONL is, and why JSONEachRow #

Types are inferred from the CSV and carried into the JSON #

Read the JSONL straight back #

Options worth knowing #

Reverse direction #

How fast is it? #

In Python with chDB #

Run it yourself #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

What JSONL is, and why JSONEachRow

Types are inferred from the CSV and carried into the JSON

Read the JSONL straight back

Options worth knowing

Reverse direction

How fast is it?

In Python with chDB

Run it yourself