To convert Parquet to JSONL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the Parquet file and write back out as JSONEachRow:
clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
{"event_id":1,"ts":"2026-01-01 00:00:00.000","country":"GB","action":"click","amount":5,"attrs":{"os":"ios","version":"1"}}
{"event_id":2,"ts":"2026-01-01 00:02:17.000","country":"US","action":"view","amount":6.01,"attrs":{"os":"android","version":"2"}}
{"event_id":3,"ts":"2026-01-01 00:04:34.000","country":"DE","action":"purchase","amount":7.02,"attrs":{"os":"web","version":"3"}}
clickhouse local reads the schema from the Parquet footer, so you never declare columns. The file is converted in place with no import step. JSONEachRow is the JSONL / NDJSON wire format: one JSON object per line, written as a stream so files larger than RAM convert without issue.
FORMAT JSONEachRow is what produces JSONL. If you want a single top-level JSON array instead of one object per line, use FORMAT JSON instead, but that is no longer JSONL and it materializes the whole array. For line-delimited output that streams, stay with JSONEachRow.
Parquet is typed and self-describing. clickhouse-local reads those types directly; you never write a schema. Check what it found with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_id UInt64
ts DateTime64(3, 'UTC')
country String
action String
amount Float64
attrs Map(String, String)
Those types decide how each value is rendered in JSON. Integers and floats become JSON numbers ("amount":6.01). Strings become JSON strings. The DateTime64(3) timestamp is written as a quoted string with millisecond precision ("2026-01-01 00:02:17.000), because JSON has no native date type. The precision shown matches the column's declared scale.
Nested columns keep their structure
This is the part upload-based converters get wrong. Parquet columns are often nested — a Map, a Tuple, a list. A flat target like CSV would have to stringify them or drop them. JSONL is hierarchical, so the structure carries straight through. The attrs column above is a Map(String, String), and it lands as a nested JSON object, not an escaped string:
clickhouse local -q "SELECT attrs FROM file('events.jsonl') LIMIT 3 FORMAT JSONEachRow"
{"attrs":{"os":"ios","version":"1"}}
{"attrs":{"os":"android","version":"2"}}
{"attrs":{"os":"web","version":"3"}}
{"os":"ios","version":"1"} is real JSON you can index into downstream, not "{'os': 'ios'}" baked into a string. Arrays in Parquet become JSON arrays the same way. This is the main reason to pick JSONL over a flat format when the source has any nesting.
Filter and reshape on the way out
The conversion is a SELECT, so the full SQL surface is available before you write. Project a subset of columns, filter rows, rename, compute — the output is whatever the query returns:
clickhouse local -q "
SELECT event_id, country, amount
FROM file('events.parquet')
WHERE action = 'purchase'
ORDER BY amount DESC
INTO OUTFILE 'purchases.jsonl' FORMAT JSONEachRow"
{"event_id":19,"country":"GB","amount":23.18}
{"event_id":15,"country":"DE","amount":19.14}
{"event_id":11,"country":"IN","amount":15.1}
An online converter gives you all rows and all columns and nothing else. Here the conversion and the transform are one pass.
A few things the one-liner doesn't show that matter in practice:
- Compress on the way out. Give the output a
.gz, .zst, .lz4, or .xz suffix and clickhouse-local compresses transparently — no extra flag. events.jsonl (2,605 bytes) became events.jsonl.gz (547 bytes) just by renaming the target. Reading it back is equally transparent: file('events.jsonl.gz') decompresses on the fly.
TRUNCATE to overwrite. INTO OUTFILE refuses to clobber an existing file by default. Add TRUNCATE (as the runnable scripts do) to make re-runs idempotent.
- Pick the right object-per-line variant.
JSONEachRow is standard JSONL. JSONStringsEachRow renders every value as a string if a downstream parser is strict about that. JSONCompactEachRow writes each row as a JSON array (values only, no keys) — smaller, but not the object-per-line shape most tools expect from JSONL.
- It streams. The file is read and written in a stream, so peak memory stays flat regardless of file size. That's what lets a single laptop convert a Parquet file bigger than its RAM.
Reverse direction? See how to convert JSONL to Parquet to go back the other way.
On a 3,000,000-row events_large.parquet, the full conversion to a ~409 MB events_large.jsonl runs in:
clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.jsonl' TRUNCATE FORMAT JSONEachRow"
~0.28 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That is reading the columnar Parquet, serializing 3M JSON objects, and writing 409 MB to disk. No upload, no server round-trip.
If you'd rather stay in Python, chDB is the same ClickHouse engine in-process. The conversion is the identical SQL. Run import chdb and call it:
import chdb
chdb.query(
"SELECT * FROM file('data/events.parquet') "
"INTO OUTFILE 'data/events.jsonl' TRUNCATE FORMAT JSONEachRow"
)
with open("data/events.jsonl") as f:
for line in f.read().splitlines()[:3]:
print(line)
{"event_id":1,"ts":"2026-01-01 00:00:00.000","country":"GB","action":"click","amount":5,"attrs":{"os":"ios","version":"1"}}
{"event_id":2,"ts":"2026-01-01 00:02:17.000","country":"US","action":"view","amount":6.01,"attrs":{"os":"android","version":"2"}}
{"event_id":3,"ts":"2026-01-01 00:04:34.000","country":"DE","action":"purchase","amount":7.02,"attrs":{"os":"web","version":"3"}}
Same nested-object handling, same types, no server to start. For reading the resulting JSONL back into a DataFrame, see how to read a JSONL file in Python with chDB.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-jsonl
The same SQL that converts a file on your laptop runs unchanged against a Parquet file on S3, against a ClickHouse server, or against ClickHouse Cloud when the data outgrows one machine. Related: query a Parquet file, what is a Parquet file, and what is NDJSON.