How to convert JSON to CSV

To convert JSON to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the JSON file and write the result INTO OUTFILE as CSV:

1clickhouse local -q "SELECT * FROM file('events.jsonl', JSONEachRow) INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

1"event_id","event_type","ts","user","amounts"
21,"login","2026-06-01 09:00:00","GB","free","[1.5,0.5]"
32,"purchase","2026-06-01 09:00:37","US","pro","[2.5,1.5]"

The JSON is read in place with no import step: ClickHouse infers the schema from the data and streams each row out as CSV. The one thing to watch is nesting. CSV is flat, so a nested user object expands inline and an amounts array serialises as a string. The sections below show how to flatten them into named columns.

What's in the JSON

The sample is newline-delimited JSON (one object per line, the JSONEachRow format). Each event carries a nested user object and an amounts array:

1{"event_id":1,"event_type":"login","ts":"2026-06-01 09:00:00","user":{"country":"GB","plan":"free"},"amounts":[1.5,0.5]}
2{"event_id":2,"event_type":"purchase","ts":"2026-06-01 09:00:37","user":{"country":"US","plan":"pro"},"amounts":[2.5,1.5]}

clickhouse-local infers the schema from the data. Run DESCRIBE to see what it found:

1clickhouse local -q "DESCRIBE file('events.jsonl', JSONEachRow)"

1event_id	Nullable(Int64)
2event_type	Nullable(String)
3ts	Nullable(DateTime)
4user	Tuple(country Nullable(String), plan Nullable(String))
5amounts	Array(Nullable(Float64))

user came back as a Tuple (a nested record) and amounts as an Array. Those two columns are what CSV can't represent directly.

Why a naive `SELECT *` breaks

If you convert with SELECT *, the nested user object is expanded inline. The header still lists five columns, but each data row now carries six fields, because country and plan were spread out with no names of their own:

1clickhouse local -q "SELECT * FROM file('events.jsonl', JSONEachRow) INTO OUTFILE 'naive.csv' FORMAT CSVWithNames"

1"event_id","event_type","ts","user","amounts"
21,"login","2026-06-01 09:00:00","GB","free","[1.5,0.5]"
32,"purchase","2026-06-01 09:00:37","US","pro","[2.5,1.5]"

Five header names, six values per row. Anything that reads this CSV by counting columns against the header will misalign. This is the flattening problem online converters either choke on or silently mangle.

Flatten nested fields, then write CSV

The fix is to project the nested fields into named top-level columns yourself. Address the tuple members by path (user.country), pick array elements by index (amounts[1]), and serialize the rest of an array into one string column if you want to keep it:

1clickhouse local -q "
2SELECT
3  event_id,
4  event_type,
5  ts,
6  user.country AS user_country,
7  user.plan    AS user_plan,
8  amounts[1]   AS amount_primary,
9  arrayStringConcat(arrayMap(x -> toString(x), amounts), ';') AS amounts_list
10FROM file('events.jsonl', JSONEachRow)
11INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

1"event_id","event_type","ts","user_country","user_plan","amount_primary","amounts_list"
21,"login","2026-06-01 09:00:00","GB","free",1.5,"1.5;0.5"
32,"purchase","2026-06-01 09:00:37","US","pro",2.5,"2.5;1.5"
43,"logout","2026-06-01 09:01:14","DE","team",3.5,"3.5;2.5"
54,"signup","2026-06-01 09:01:51","FR","free",4.5,"4.5;3.5"

Every column is now flat and named. Read the CSV back and the schema round-trips cleanly:

1clickhouse local -q "DESCRIBE file('events.csv', CSVWithNames)"

1event_id	Nullable(Int64)
2event_type	Nullable(String)
3ts	Nullable(DateTime)
4user_country	Nullable(String)
5user_plan	Nullable(String)
6amount_primary	Nullable(Float64)
7amounts_list	Nullable(String)

This is the lossy step to be deliberate about: an array of two values became a primary scalar plus a delimited string. That is a choice, not a default. Decide per column whether you want one element, a delimited list, or a JSON string (toJSONString(amounts)), because CSV will not keep the structure for you.

Options worth knowing

These are the levers you don't get from a drag-and-drop converter site:

Header or no header. CSVWithNames writes the header row shown above. Use CSV to omit it.
Pick or rename columns. The SELECT list is your column spec. Drop fields, reorder them, cast types, or compute new ones in the same query.
Delimiter. CSV uses commas; for tab- or pipe-separated output switch the format (TSV) or set format_csv_delimiter.
Keep nesting as JSON strings. If you'd rather not flatten, wrap each nested field with toJSONString(...) so the structure survives as text in a single cell.
Top-level JSON arrays. If your file is one big [ {...}, {...} ] array instead of one object per line, read it with the JSONEachRow format too — it handles both shapes.

How fast is it?

Converting a 1,000,000-row, ~125 MB newline-delimited JSON file (events_large.jsonl) to CSV, with the full flattening projection above, completes in:

1clickhouse local -q "
2SELECT event_id, event_type, ts, user.country AS user_country, user.plan AS user_plan,
3  amounts[1] AS amount_primary,
4  arrayStringConcat(arrayMap(x -> toString(x), amounts), ';') AS amounts_list
5FROM file('events_large.jsonl', JSONEachRow)
6INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"

~0.42 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That parses every JSON object from scratch and writes 1,000,000 CSV rows; there's no cached table in between. A concurrent build on the same machine can nudge that number, so treat it as a ballpark, not a benchmark.

chDB Python equivalent

If you live in Python, chDB is the same engine in-process. Run the identical SQL — including INTO OUTFILE — with chdb.query, and skip pandas entirely for the conversion:

1import chdb
2
3chdb.query("""
4SELECT
5  event_id,
6  event_type,
7  ts,
8  user.country AS user_country,
9  user.plan    AS user_plan,
10  amounts[1]   AS amount_primary,
11  arrayStringConcat(arrayMap(x -> toString(x), amounts), ';') AS amounts_list
12FROM file('events.jsonl', JSONEachRow)
13INTO OUTFILE 'events_chdb.csv' TRUNCATE
14FORMAT CSVWithNames
15""")

1"event_id","event_type","ts","user_country","user_plan","amount_primary","amounts_list"
21,"login","2026-06-01 09:00:00","GB","free",1.5,"1.5;0.5"
32,"purchase","2026-06-01 09:00:37","US","pro",2.5,"2.5;1.5"

The flattening logic lives in the SQL, so it's identical on the command line and in Python. For loading JSON into a DataFrame instead of a file, see how to read a JSON file in Python with chDB and how to flatten nested JSON in Python.

Reverse direction?

Going the other way is just as direct. See how to convert CSV to JSON, or keep the data columnar instead with convert JSON to Parquet — Parquet keeps the nested structure and the types, so it's the better target if you'll query the data again rather than hand it to a tool that needs flat CSV. To explore the JSON in place before converting, see how to run SQL on a JSON file.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample JSON (including the ~125 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-json-to-csv

The same SQL that converted this file scales unchanged to a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite.

What's in the JSON

Why a naive `SELECT *` breaks

Flatten nested fields, then write CSV

Options worth knowing

How fast is it?

chDB Python equivalent

Reverse direction?

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to convert JSON to CSV

What's in the JSON #

Why a naive SELECT * breaks #

Flatten nested fields, then write CSV #

Options worth knowing #

How fast is it? #

chDB Python equivalent #

Reverse direction? #

Run it yourself #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

What's in the JSON

Why a naive `SELECT *` breaks

Flatten nested fields, then write CSV

Options worth knowing

How fast is it?

chDB Python equivalent

Reverse direction?

Run it yourself