How to convert NDJSON to CSV

To convert NDJSON to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then convert the file:

1clickhouse local -q "SELECT * FROM file('events.ndjson') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

ClickHouse reads the NDJSON in place, infers the schema from the data, and writes CSV with a header row. The file is never uploaded or imported into a database first.

The catch is nested records, and that's where most online converters either fail or quietly mangle the output.

The flattening trap

NDJSON objects often contain nested objects and arrays. Take this file:

1{"event_id":1,"ts":"2026-06-01 00:00:00","country":"GB","action":"click","amount":5,"device":{"os":"macos","app_version":"2.0"},"tags":[]}
2{"event_id":2,"ts":"2026-06-01 00:02:17","country":"US","action":"view","amount":6.01,"device":{"os":"linux","app_version":"2.1"},"tags":[0]}

CSV is flat: one value per cell, one header per column. JSON is not. When you run SELECT * over this, ClickHouse infers the device object as a Tuple:

1clickhouse local -q "DESCRIBE file('events.ndjson')"

1event_id	Nullable(Int64)
2ts	Nullable(DateTime)
3country	Nullable(String)
4action	Nullable(String)
5amount	Nullable(Float64)
6device	Tuple(
7    app_version Nullable(String),
8    os Nullable(String))
9tags	Array(Nullable(Int64))

A Tuple written to CSV expands into one cell per element, but the header only gets one name. The result is a file where the header row has fewer columns than the data rows:

1header columns: 7
2data columns:   8
3"event_id","ts","country","action","amount","device","tags"
41,"2026-06-01 00:00:00","GB","click",5,"2.0","macos","[]"

That CSV is broken: seven headers, eight values. Any reader that trusts the header will misalign every column after device. This is the part a generic "JSON to CSV" tool can't decide for you, because flattening is a choice about your data.

The correct conversion: flatten explicitly

Project the nested fields into their own top-level columns, and serialise the array into a single cell as a JSON string with toJSONString. Now the header and the data agree:

1clickhouse local -q "
2SELECT
3  event_id,
4  ts,
5  country,
6  action,
7  amount,
8  device.os          AS device_os,
9  device.app_version AS device_version,
10  toJSONString(tags) AS tags
11FROM file('events.ndjson')
12INTO OUTFILE 'events_flat.csv' TRUNCATE FORMAT CSVWithNames"

1header columns: 8
2data columns:   8
3"event_id","ts","country","action","amount","device_os","device_version","tags"
41,"2026-06-01 00:00:00","GB","click",5,"macos","2.0","[]"
52,"2026-06-01 00:02:17","US","view",6.01,"linux","2.1","[0]"

device.os and device.app_version reach into the inferred tuple. The array becomes the JSON text [0] in one cell, which round-trips cleanly and is easy to re-parse later. Types are carried across from the inferred schema: amount stays numeric, ts stays a timestamp. The CSV reads straight back as a real table:

1clickhouse local -q "SELECT country, count() AS events, round(sum(amount),2) AS total FROM file('events_flat.csv') GROUP BY country ORDER BY total DESC"

1IN	4	66.46
2FR	4	62.42
3DE	4	58.38
4US	4	54.34
5GB	4	50.3

Options worth knowing

Keep nested structure as JSON. If you'd rather preserve the whole object than split it, write toJSONString(device) AS device to keep {"os":"macos","app_version":"2.0"} in one cell instead of flattening into two columns. Choose per field.
No header. Use FORMAT CSV instead of CSVWithNames to omit the header row.
Delimiter and quoting. SETTINGS format_csv_delimiter='\t' switches to tab-separated; format_csv_allow_single_quotes and friends tune quoting. See the TSV guide if tab output is what you actually want.
Compressed in or out. A .ndjson.gz input is decompressed automatically from the extension, and writing to events.csv.gz compresses on the way out. No flags.
Select and filter while converting. Because the source is a table, you can add WHERE, ORDER BY, or pick a column subset in the same statement. The conversion and the cleanup are one pass.

Reverse direction? Going the other way: convert CSV to NDJSON.

How fast is it

On a 1,000,000-row, ~144 MB NDJSON file (events_large.ndjson), the full flatten-and-write conversion to a ~67 MB CSV runs in ~0.41 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The whole file streams through in a single pass, so memory stays flat regardless of file size.

Do it in Python with chDB

Prefer to stay in Python? chDB is the same engine in-process. Run the identical SQL with chdb.query:

1import chdb
2
3sql = """
4SELECT
5  event_id, ts, country, action, amount,
6  device.os          AS device_os,
7  device.app_version AS device_version,
8  toJSONString(tags) AS tags
9FROM file('events.ndjson')
10INTO OUTFILE 'events_flat_chdb.csv' TRUNCATE FORMAT CSVWithNames
11"""
12chdb.query(sql)

This writes the same CSV with no pandas round-trip and no temporary DataFrame. For comparison, see reading NDJSON in Python with chDB and the related flatten nested JSON in Python guide.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh creates the sample NDJSON (including the ~144 MB file used for the timing above), run.sh has every command on this page, run.py and run.ipynb hold the chDB version, and expected_output.txt is the captured proof.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-ndjson-to-csv

Online converters need an upload and a round-trip, and most of them guess at nested fields. The local approach keeps the data on your machine, lets you decide exactly how each nested field lands, scripts cleanly, and handles files bigger than RAM. The same SQL runs unchanged against a single file, a ClickHouse server, or ClickHouse Cloud when the data outgrows your laptop.

The flattening trap

The correct conversion: flatten explicitly

Options worth knowing

How fast is it

Do it in Python with chDB

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to convert NDJSON to CSV

The flattening trap #

The correct conversion: flatten explicitly #

Options worth knowing #

How fast is it #

Do it in Python with chDB #

Run it yourself #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

The flattening trap

The correct conversion: flatten explicitly

Options worth knowing

How fast is it

Do it in Python with chDB

Run it yourself