To convert NDJSON to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
clickhouse local -q "SELECT * FROM file('events.ndjson') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"
ClickHouse reads the NDJSON in place, infers the schema from the data, and writes CSV with a header row. The file is never uploaded or imported into a database first.
The catch is nested records, and that's where most online converters either fail or quietly mangle the output.
NDJSON objects often contain nested objects and arrays. Take this file:
{"event_id":1,"ts":"2026-06-01 00:00:00","country":"GB","action":"click","amount":5,"device":{"os":"macos","app_version":"2.0"},"tags":[]}
{"event_id":2,"ts":"2026-06-01 00:02:17","country":"US","action":"view","amount":6.01,"device":{"os":"linux","app_version":"2.1"},"tags":[0]}
CSV is flat: one value per cell, one header per column. JSON is not. When you run SELECT * over this, ClickHouse infers the device object as a Tuple:
clickhouse local -q "DESCRIBE file('events.ndjson')"
event_id Nullable(Int64)
ts Nullable(DateTime)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
device Tuple(
app_version Nullable(String),
os Nullable(String))
tags Array(Nullable(Int64))
A Tuple written to CSV expands into one cell per element, but the header only gets one name. The result is a file where the header row has fewer columns than the data rows:
header columns: 7
data columns: 8
"event_id","ts","country","action","amount","device","tags"
1,"2026-06-01 00:00:00","GB","click",5,"2.0","macos","[]"
That CSV is broken: seven headers, eight values. Any reader that trusts the header will misalign every column after device. This is the part a generic "JSON to CSV" tool can't decide for you, because flattening is a choice about your data.
Project the nested fields into their own top-level columns, and serialise the array into a single cell as a JSON string with toJSONString. Now the header and the data agree:
clickhouse local -q "
SELECT
event_id,
ts,
country,
action,
amount,
device.os AS device_os,
device.app_version AS device_version,
toJSONString(tags) AS tags
FROM file('events.ndjson')
INTO OUTFILE 'events_flat.csv' TRUNCATE FORMAT CSVWithNames"
header columns: 8
data columns: 8
"event_id","ts","country","action","amount","device_os","device_version","tags"
1,"2026-06-01 00:00:00","GB","click",5,"macos","2.0","[]"
2,"2026-06-01 00:02:17","US","view",6.01,"linux","2.1","[0]"
device.os and device.app_version reach into the inferred tuple. The array becomes the JSON text [0] in one cell, which round-trips cleanly and is easy to re-parse later. Types are carried across from the inferred schema: amount stays numeric, ts stays a timestamp. The CSV reads straight back as a real table:
clickhouse local -q "SELECT country, count() AS events, round(sum(amount),2) AS total FROM file('events_flat.csv') GROUP BY country ORDER BY total DESC"
IN 4 66.46
FR 4 62.42
DE 4 58.38
US 4 54.34
GB 4 50.3
- Keep nested structure as JSON. If you'd rather preserve the whole object than split it, write
toJSONString(device) AS device to keep {"os":"macos","app_version":"2.0"} in one cell instead of flattening into two columns. Choose per field.
- No header. Use
FORMAT CSV instead of CSVWithNames to omit the header row.
- Delimiter and quoting.
SETTINGS format_csv_delimiter='\t' switches to tab-separated; format_csv_allow_single_quotes and friends tune quoting. See the TSV guide if tab output is what you actually want.
- Compressed in or out. A
.ndjson.gz input is decompressed automatically from the extension, and writing to events.csv.gz compresses on the way out. No flags.
- Select and filter while converting. Because the source is a table, you can add
WHERE, ORDER BY, or pick a column subset in the same statement. The conversion and the cleanup are one pass.
Reverse direction? Going the other way: convert CSV to NDJSON.
On a 1,000,000-row, ~144 MB NDJSON file (events_large.ndjson), the full flatten-and-write conversion to a ~67 MB CSV runs in ~0.41 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The whole file streams through in a single pass, so memory stays flat regardless of file size.
Prefer to stay in Python? chDB is the same engine in-process. Run the identical SQL with chdb.query:
import chdb
sql = """
SELECT
event_id, ts, country, action, amount,
device.os AS device_os,
device.app_version AS device_version,
toJSONString(tags) AS tags
FROM file('events.ndjson')
INTO OUTFILE 'events_flat_chdb.csv' TRUNCATE FORMAT CSVWithNames
"""
chdb.query(sql)
This writes the same CSV with no pandas round-trip and no temporary DataFrame. For comparison, see reading NDJSON in Python with chDB and the related flatten nested JSON in Python guide.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh creates the sample NDJSON (including the ~144 MB file used for the timing above), run.sh has every command on this page, run.py and run.ipynb hold the chDB version, and expected_output.txt is the captured proof.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-ndjson-to-csv
Online converters need an upload and a round-trip, and most of them guess at nested fields. The local approach keeps the data on your machine, lets you decide exactly how each nested field lands, scripts cleanly, and handles files bigger than RAM. The same SQL runs unchanged against a single file, a ClickHouse server, or ClickHouse Cloud when the data outgrows your laptop.