How to convert CSV to NDJSON

Al Brown
Last updated: Jun 15, 2026

To convert a CSV file to NDJSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the rows out as JSONEachRow:

1clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.ndjson' FORMAT JSONEachRow"
1{"event_date":"2026-01-01","event_id":1,"country":"GB","action":"click","value":5,"count":1}
2{"event_date":"2026-01-02","event_id":2,"country":"US","action":"view","value":6.01,"count":2}
3{"event_date":"2026-01-03","event_id":3,"country":"DE","action":"signup","value":7.02,"count":3}
4{"event_date":"2026-01-04","event_id":4,"country":"FR","action":"purchase","value":8.03,"count":4}
5{"event_date":"2026-01-05","event_id":5,"country":"IN","action":"click","value":9.04,"count":5}

The CSV is read in place with no import step: clickhouse local reads the header for column names, infers each column's type from the data, and emits one JSON object per line. Numbers stay numbers in the output; dates come out as quoted ISO strings.

NDJSON, JSONL, and the .ndjson extension #

NDJSON (newline-delimited JSON) and JSONL (JSON Lines) are the same format: one self-contained JSON object per line, no enclosing array, no commas between records. ClickHouse writes both with the JSONEachRow format. The only thing that changes is the file extension you choose. Use .ndjson here, or .jsonl if that's what your downstream tool expects:

1clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"

This is different from the JSON format, which writes a single top-level array plus metadata. NDJSON is the line-oriented variant that streaming tools, log pipelines, and jq expect, because each line stands alone.

Types carry across, no manual mapping #

The reason to convert with a SQL engine rather than a text-munging script is type fidelity. The CSV header gives column names; the data gives types. Check what was inferred with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.csv')"
1event_date	Nullable(Date)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5value	Nullable(Float64)
6count	Nullable(Int64)

Those types decide how each value is rendered in the JSON. Numbers come out as JSON numbers ("value":6.01), not quoted strings. Dates come out as quoted strings in ISO form. A generic CSV-to-JSON converter that treats every field as text would quote everything, and you'd have to re-cast on the other side.

Options: things an upload-and-convert site can't do #

The information gain over a browser-based converter is here.

Force a type when inference guesses wrong. A zero-padded order id or a US ZIP code looks like an integer and loses its leading zeros if inferred as one. Pass the format and an explicit schema as the second and third arguments to file() to keep it a string:

1clickhouse local -q "
2SELECT * FROM file('events.csv', 'CSVWithNames',
3  'event_date Date, event_id String, country String, action String, value Float64, count UInt8')
4INTO OUTFILE 'events_typed.ndjson' FORMAT JSONEachRow"
1{"event_date":"2026-01-01","event_id":"1","country":"GB","action":"click","value":5,"count":1}
2{"event_date":"2026-01-02","event_id":"2","country":"US","action":"view","value":6.01,"count":2}

event_id is now "1" (quoted) instead of 1.

Transform on the way out. Because it's SQL, you can filter, rename, reshape, or compute columns in the same pass — SELECT country, value FROM file('events.csv') WHERE action = 'purchase'. No converter site does that.

No upload. The file never leaves your machine. For anything with customer data, that alone settles it.

Reading the NDJSON back #

The conversion is lossless and the output reads straight back. JSONEachRow is self-describing, so ClickHouse re-infers the schema with no extra arguments:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(value),2) AS total
3FROM file('events.ndjson')
4GROUP BY country ORDER BY total DESC"
1US	4	60.4
2GB	4	56.36
3AU	3	48.33
4IN	3	45.3
5FR	3	42.27
6DE	3	39.24

Reverse direction? See how to convert NDJSON to CSV.

In Python with chDB #

If you'd rather stay in Python, chDB is the same ClickHouse engine as an in-process module. The SQL is identical; you capture the bytes and write them to a file:

1import chdb
2
3ndjson = chdb.query("SELECT * FROM file('events.csv') FORMAT JSONEachRow").bytes()
4with open("events.ndjson", "wb") as f:
5    f.write(ndjson)

No server, no temporary table, and it slots into an existing pandas or pyarrow pipeline.

How fast is it? #

On a 3,000,000-row, ~123 MB CSV (events_large.csv), the full conversion to a ~312 MB NDJSON file completes in ~0.35 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That's about 350 MB/s of input parsed, typed, and re-serialized as JSON. The row count matches exactly: 3,000,000 in, 3,000,000 out.

1run 1: real 0.35
2run 2: real 0.35
3run 3: real 0.35

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~123 MB file used for the timing), run.sh with every command on this page, a run.py / run.ipynb for the chDB path, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-ndjson

The same SELECT ... FORMAT JSONEachRow runs unchanged against a file on your laptop, against a ClickHouse server, and against ClickHouse Cloud when the data outgrows one machine. Related: convert CSV to JSON, convert CSV to Parquet, and run SQL on a JSONL file.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...