How to convert Parquet to NDJSON

Al Brown
Last updated: Jun 8, 2026

To convert Parquet to NDJSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

curl https://clickhouse.com/cli | sh   # install clickhousectl
clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the Parquet file and write the output as JSONEachRow:

clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.ndjson' FORMAT JSONEachRow"
{"event_id":1,"ts":"2026-05-31 23:00:00.000","action":"login","country":"GB","amount":1,"is_member":1,"attrs":{"os":"mac","plan":"free"}}
{"event_id":2,"ts":"2026-06-01 00:00:00.000","action":"purchase","country":"US","amount":2.01,"is_member":0,"attrs":{"os":"win","plan":"pro"}}
{"event_id":3,"ts":"2026-06-01 01:00:00.000","action":"logout","country":"DE","amount":3.02,"is_member":1,"attrs":{"os":"linux","plan":"free"}}

clickhouse local reads the Parquet schema from the file footer, so there are no columns to declare. The Parquet file is read in place with no import step, and rows stream out as NDJSON one JSON object per line, so files larger than RAM convert without buffering them whole.

Parquet carries its own typed schema in the file footer, so there is nothing to declare. ClickHouse reads it and you can see exactly what it found with DESCRIBE:

clickhouse local -q "DESCRIBE file('events.parquet')"
event_id	UInt64
ts	DateTime64(3, 'UTC')
action	String
country	String
amount	Float64
is_member	UInt8
attrs	Map(String, String)

Those types decide how each value is written in the NDJSON. Numbers stay numbers, strings stay strings, and a nested Map becomes a nested JSON object ("attrs":{"os":"mac","plan":"free"}). This is the advantage of converting Parquet rather than a flat text format: the nesting and types survive the trip, instead of being flattened or guessed.

Options that matter for Parquet to NDJSON

This is where a one-line conversion beats an upload-required online converter: you control the output, and nothing leaves your machine.

Booleans. A Parquet boolean usually arrives as ClickHouse UInt8, so it serialises as 1 or 0. If you want literal true / false in the JSON, cast it to Bool:

clickhouse local -q "SELECT event_id, is_member::Bool AS is_member FROM file('events.parquet') LIMIT 2 FORMAT JSONEachRow"
{"event_id":1,"is_member":true}
{"event_id":2,"is_member":false}

Large integers. JSONEachRow writes 64-bit integers unquoted by default. Many JSON parsers (including JavaScript's JSON.parse) lose precision above 2^53. If your IDs are large Int64/UInt64 values, set output_format_json_quote_64bit_integers=1 to emit them as quoted strings and keep them exact.

Project and filter while converting. Because the read is a normal SELECT, you can rename columns, drop the ones you don't need, filter rows, and sort, all in the same pass. No second tool:

clickhouse local -q "
SELECT event_id, country, amount
FROM file('events.parquet')
WHERE action = 'purchase'
ORDER BY amount DESC
LIMIT 3
INTO OUTFILE 'purchases.ndjson' FORMAT JSONEachRow"
{"event_id":18,"country":"DE","amount":18.17}
{"event_id":14,"country":"FR","amount":14.13}
{"event_id":10,"country":"IN","amount":10.09}

Compress on the way out. Add a .gz (or .zst, .lz4, .xz) suffix to the output name and ClickHouse compresses as it writes. NDJSON is verbose, so this matters. Reading it back needs no flag either, because the codec is inferred from the file name:

clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.ndjson.gz' FORMAT JSONEachRow"
clickhouse local -q "SELECT count() FROM file('events.ndjson.gz')"
20

How fast is it?

Conversion is I/O-bound and streams, so it is quick. Converting a 3,000,000-row Parquet file to NDJSON (a ~448 MB .ndjson) completes in:

clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.ndjson' FORMAT JSONEachRow"

~0.29 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift a little under concurrent load, but the point holds: the conversion is not the slow part of your pipeline. The same command streams files far larger than memory at the same rate, since rows are read and written in batches rather than buffered whole.

Reverse direction?

Going the other way is the same idea with the formats swapped. NDJSON is the schema-light interchange format; Parquet is the compact, typed, columnar one you query repeatedly. See convert NDJSON to Parquet for that pass, including how the JSON types are inferred.

Do it in Python with chDB

If you live in Python, chDB is the same ClickHouse engine in-process, so the SQL is identical. The same SELECT ... INTO OUTFILE ... FORMAT JSONEachRow writes the file:

import chdb

chdb.query(
    "SELECT * FROM file('events.parquet') "
    "INTO OUTFILE 'events.ndjson' TRUNCATE FORMAT JSONEachRow"
)

Drop the INTO OUTFILE and chDB returns the NDJSON bytes instead, ready to stream into a request body or another process:

ndjson = chdb.query(
    "SELECT event_id, country, amount FROM file('events.parquet') "
    "WHERE action = 'purchase' ORDER BY amount DESC LIMIT 2 FORMAT JSONEachRow"
)
print(str(ndjson).rstrip())
{"event_id":18,"country":"DE","amount":18.17}
{"event_id":14,"country":"FR","amount":14.13}

This is also a clean alternative to looping pandas.read_parquet then to_json(orient="records", lines=True) when the file is bigger than memory: chDB streams it instead of materialising a DataFrame.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-ndjson

The same SQL that converts one local file runs unchanged against a directory of files, a Parquet file on S3, a ClickHouse server, or ClickHouse Cloud when the data outgrows your laptop. Related guides: how to query a Parquet file, what is NDJSON, and run SQL on a JSON Lines file.


Share this resource

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

More like this

Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

Continue reading ->

How to query a REST API in Python

Al Brown • Last updated: Jun 15, 2026

Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.

Continue reading ->

How to convert Parquet to ORC

Al Brown • Last updated: Jun 6, 2026

Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.

Continue reading ->