How to convert Parquet to NDJSON

To convert Parquet to NDJSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the Parquet file and write the output as JSONEachRow:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.ndjson' FORMAT JSONEachRow"

1{"event_id":1,"ts":"2026-05-31 23:00:00.000","action":"login","country":"GB","amount":1,"is_member":1,"attrs":{"os":"mac","plan":"free"}}
2{"event_id":2,"ts":"2026-06-01 00:00:00.000","action":"purchase","country":"US","amount":2.01,"is_member":0,"attrs":{"os":"win","plan":"pro"}}
3{"event_id":3,"ts":"2026-06-01 01:00:00.000","action":"logout","country":"DE","amount":3.02,"is_member":1,"attrs":{"os":"linux","plan":"free"}}

clickhouse local reads the Parquet schema from the file footer, so there are no columns to declare. The Parquet file is read in place with no import step, and rows stream out as NDJSON one JSON object per line, so files larger than RAM convert without buffering them whole.

Parquet carries its own typed schema in the file footer, so there is nothing to declare. ClickHouse reads it and you can see exactly what it found with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.parquet')"

1event_id	UInt64
2ts	DateTime64(3, 'UTC')
3action	String
4country	String
5amount	Float64
6is_member	UInt8
7attrs	Map(String, String)

Those types decide how each value is written in the NDJSON. Numbers stay numbers, strings stay strings, and a nested Map becomes a nested JSON object ("attrs":{"os":"mac","plan":"free"}). This is the advantage of converting Parquet rather than a flat text format: the nesting and types survive the trip, instead of being flattened or guessed.

Options that matter for Parquet to NDJSON

This is where a one-line conversion beats an upload-required online converter: you control the output, and nothing leaves your machine.

Booleans. A Parquet boolean usually arrives as ClickHouse UInt8, so it serialises as 1 or 0. If you want literal true / false in the JSON, cast it to Bool:

1clickhouse local -q "SELECT event_id, is_member::Bool AS is_member FROM file('events.parquet') LIMIT 2 FORMAT JSONEachRow"

1{"event_id":1,"is_member":true}
2{"event_id":2,"is_member":false}

Large integers. JSONEachRow writes 64-bit integers unquoted by default. Many JSON parsers (including JavaScript's JSON.parse) lose precision above 2^53. If your IDs are large Int64/UInt64 values, set output_format_json_quote_64bit_integers=1 to emit them as quoted strings and keep them exact.

Project and filter while converting. Because the read is a normal SELECT, you can rename columns, drop the ones you don't need, filter rows, and sort, all in the same pass. No second tool:

1clickhouse local -q "
2SELECT event_id, country, amount
3FROM file('events.parquet')
4WHERE action = 'purchase'
5ORDER BY amount DESC
6LIMIT 3
7INTO OUTFILE 'purchases.ndjson' FORMAT JSONEachRow"

1{"event_id":18,"country":"DE","amount":18.17}
2{"event_id":14,"country":"FR","amount":14.13}
3{"event_id":10,"country":"IN","amount":10.09}

Compress on the way out. Add a .gz (or .zst, .lz4, .xz) suffix to the output name and ClickHouse compresses as it writes. NDJSON is verbose, so this matters. Reading it back needs no flag either, because the codec is inferred from the file name:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.ndjson.gz' FORMAT JSONEachRow"
2clickhouse local -q "SELECT count() FROM file('events.ndjson.gz')"

How fast is it?

Conversion is I/O-bound and streams, so it is quick. Converting a 3,000,000-row Parquet file to NDJSON (a ~448 MB .ndjson) completes in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.ndjson' FORMAT JSONEachRow"

~0.29 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift a little under concurrent load, but the point holds: the conversion is not the slow part of your pipeline. The same command streams files far larger than memory at the same rate, since rows are read and written in batches rather than buffered whole.

Reverse direction?

Going the other way is the same idea with the formats swapped. NDJSON is the schema-light interchange format; Parquet is the compact, typed, columnar one you query repeatedly. See convert NDJSON to Parquet for that pass, including how the JSON types are inferred.

Do it in Python with chDB

If you live in Python, chDB is the same ClickHouse engine in-process, so the SQL is identical. The same SELECT ... INTO OUTFILE ... FORMAT JSONEachRow writes the file:

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('events.parquet') "
5    "INTO OUTFILE 'events.ndjson' TRUNCATE FORMAT JSONEachRow"
6)

Drop the INTO OUTFILE and chDB returns the NDJSON bytes instead, ready to stream into a request body or another process:

1ndjson = chdb.query(
2    "SELECT event_id, country, amount FROM file('events.parquet') "
3    "WHERE action = 'purchase' ORDER BY amount DESC LIMIT 2 FORMAT JSONEachRow"
4)
5print(str(ndjson).rstrip())

1{"event_id":18,"country":"DE","amount":18.17}
2{"event_id":14,"country":"FR","amount":14.13}

This is also a clean alternative to looping pandas.read_parquet then to_json(orient="records", lines=True) when the file is bigger than memory: chDB streams it instead of materialising a DataFrame.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-ndjson

The same SQL that converts one local file runs unchanged against a directory of files, a Parquet file on S3, a ClickHouse server, or ClickHouse Cloud when the data outgrows your laptop. Related guides: how to query a Parquet file, what is NDJSON, and run SQL on a JSON Lines file.

The schema comes from the Parquet footer

Options that matter for Parquet to NDJSON

How fast is it?

Reverse direction?

Do it in Python with chDB

Run it yourself

Subscribe to our newsletter

More like this

How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid

How to query a REST API in Python

How to convert Parquet to ORC

How to query a REST API with SQL

How to convert Parquet to NDJSON

The schema comes from the Parquet footer #

Options that matter for Parquet to NDJSON #

How fast is it? #

Reverse direction? #

Do it in Python with chDB #

Run it yourself #

Subscribe to our newsletter

More like this

How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid

How to query a REST API in Python

How to convert Parquet to ORC

How to query a REST API with SQL

The schema comes from the Parquet footer

Options that matter for Parquet to NDJSON

How fast is it?

Reverse direction?

Do it in Python with chDB

Run it yourself