How to convert JSONL to CSV

Al Brown
Last updated: Jun 15, 2026

To convert a JSONL file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then flatten the JSONL into CSV columns and write the result:

1clickhouse local -q "
2SELECT
3  event_id,
4  ts,
5  action,
6  geo.country AS geo_country,
7  geo.city    AS geo_city,
8  arrayStringConcat(tags, '|') AS tags
9FROM file('events.jsonl')
10INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"
1"event_id","ts","action","geo_country","geo_city","tags"
21,"2026-06-01 00:00:00","login","GB","London",""
32,"2026-06-01 00:02:17","click","US","New York","mobile"
43,"2026-06-01 00:04:34","purchase","DE","Berlin","mobile|web"
54,"2026-06-01 00:06:51","logout","FR","Paris",""

The file is read in place with no import step: ClickHouse infers the schema from the JSON objects, flattens the nested geo object into two columns and the tags array into a pipe-joined cell, and writes the CSV directly to disk.

The nested-JSON gotcha #

JSONL allows nested objects and arrays inside each line. CSV is flat: every row is a list of scalar cells. So a naive SELECT * has nowhere to put the structure. Take this input, where each event has a nested geo object and a tags array:

1{"event_id":1,"ts":"2026-06-01 00:00:00","action":"login","geo":{"country":"GB","city":"London"},"tags":[]}
2{"event_id":2,"ts":"2026-06-01 00:02:17","action":"click","geo":{"country":"US","city":"New York"},"tags":["mobile"]}

Check what clickhouse-local infers with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.jsonl')"
1event_id	Nullable(Int64)
2ts	Nullable(DateTime)
3action	Nullable(String)
4geo	Tuple(city Nullable(String), country Nullable(String))
5tags	Array(Nullable(String))

geo becomes a two-field Tuple and tags becomes an Array. If you convert with SELECT *, ClickHouse expands the tuple into separate cells but the header still has a single geo column, so the columns and the header no longer line up:

1"event_id","ts","action","geo","tags"
21,"2026-06-01 00:00:00","login","London","GB","[]"
32,"2026-06-01 00:02:17","click","New York","US","['mobile']"

Five header names, six data values. That CSV is misaligned and will confuse whatever reads it next. Flatten the nested fields explicitly instead.

Flatten nested fields into columns #

Name each nested field as its own column. Reach into the object with dot access (geo.country), and turn the array into one scalar cell with arrayStringConcat, picking a separator that won't clash with your data:

1clickhouse local -q "
2SELECT
3  event_id,
4  ts,
5  action,
6  geo.country AS geo_country,
7  geo.city    AS geo_city,
8  arrayStringConcat(tags, '|') AS tags
9FROM file('events.jsonl')
10INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"

Now the header and the data agree, the nested object is two clean columns, and the array is a single pipe-joined cell:

1"event_id","ts","action","geo_country","geo_city","tags"
21,"2026-06-01 00:00:00","login","GB","London",""
32,"2026-06-01 00:02:17","click","US","New York","mobile"
43,"2026-06-01 00:04:34","purchase","DE","Berlin","mobile|web"
54,"2026-06-01 00:06:51","logout","FR","Paris",""

TRUNCATE overwrites the output file if it already exists, so the command is safe to re-run. Read the CSV straight back to confirm the round-trip:

1clickhouse local -q "SELECT geo_country, count() AS events FROM file('events.csv') GROUP BY geo_country ORDER BY geo_country"
1DE	3
2FR	3
3GB	3
4US	3

Options worth knowing #

The conversion is one line, but a few details decide whether the CSV is correct downstream:

  • Flatten, don't drop. Dot access (geo.country) reaches one level; for deeper nesting, chain it (a.b.c) or use JSONExtract on a raw JSON string. This is the one lossy step in the conversion: a tree becomes flat columns, and you choose how.
  • Pick an array separator deliberately. arrayStringConcat(tags, '|') keeps the array in a single cell. A comma separator would collide with CSV delimiters; pipe or semicolon is safer. Or explode the array into one row per element with ARRAY JOIN.
  • Header or no header. CSVWithNames writes the header row. Plain CSV omits it.
  • Types are carried, then stringified. CSV has no schema, so Int64 and DateTime are written as text. The values are exact; only the type metadata is lost, which is inherent to CSV, not to the tool.
  • Compress on the way out. Add a codec suffix to the output name and it is written compressed: INTO OUTFILE 'events.csv.gz' produces gzip with no extra flag.

That list is the difference between this and a drag-and-drop converter site: you control flattening, separators, headers, and compression, the file never leaves your machine, and the whole thing is one scriptable command. Online converters also cap upload size; here the input streams from disk, so a file larger than RAM is no different from a small one. clickhouse-local runs the same SQL unchanged across dozens of formats, against a ClickHouse server, and against ClickHouse Cloud when the data outgrows your laptop.

Reverse direction? See how to convert CSV to JSONL.

chDB: the same conversion in Python #

If you would rather stay in Python, chDB is the same ClickHouse engine in-process. Run the identical SELECT ... INTO OUTFILE, and you get the byte-identical CSV:

1import chdb
2
3chdb.query("""
4SELECT
5  event_id,
6  ts,
7  action,
8  geo.country AS geo_country,
9  geo.city    AS geo_city,
10  arrayStringConcat(tags, '|') AS tags
11FROM file('events.jsonl')
12INTO OUTFILE 'events_chdb.csv' TRUNCATE FORMAT CSVWithNames
13""")
1"event_id","ts","action","geo_country","geo_city","tags"
21,"2026-06-01 00:00:00","login","GB","London",""
32,"2026-06-01 00:02:17","click","US","New York","mobile"
43,"2026-06-01 00:04:34","purchase","DE","Berlin","mobile|web"

There is no pandas round-trip and no json.loads loop to write yourself; the flattening happens in the SQL. See how to read a JSONL file in Python with chDB for querying the same data into a DataFrame.

How fast is it? #

On a 1,200,000-row, ~146 MB events_large.jsonl, parsing every JSON line, flattening the nested geo object, joining the tags array, and writing the full CSV out completes in ~0.39 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That is the whole conversion, parse to written file, not a cached table. Concurrent load on the machine can move that number a little; the point is the conversion is not the slow part of your pipeline.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample JSONL (including the ~146 MB file used for the timing), run.sh with every command on this page, a run.py / run.ipynb chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-jsonl-to-csv

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...