How to convert Parquet to JSONL

Al Brown
Last updated: Jun 8, 2026

To convert Parquet to JSONL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the Parquet file and write back out as JSONEachRow:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
1{"event_id":1,"ts":"2026-01-01 00:00:00.000","country":"GB","action":"click","amount":5,"attrs":{"os":"ios","version":"1"}}
2{"event_id":2,"ts":"2026-01-01 00:02:17.000","country":"US","action":"view","amount":6.01,"attrs":{"os":"android","version":"2"}}
3{"event_id":3,"ts":"2026-01-01 00:04:34.000","country":"DE","action":"purchase","amount":7.02,"attrs":{"os":"web","version":"3"}}

clickhouse local reads the schema from the Parquet footer, so you never declare columns. The file is converted in place with no import step. JSONEachRow is the JSONL / NDJSON wire format: one JSON object per line, written as a stream so files larger than RAM convert without issue.

FORMAT JSONEachRow is what produces JSONL. If you want a single top-level JSON array instead of one object per line, use FORMAT JSON instead, but that is no longer JSONL and it materializes the whole array. For line-delimited output that streams, stay with JSONEachRow.

Parquet is typed and self-describing. clickhouse-local reads those types directly; you never write a schema. Check what it found with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.parquet')"
1event_id	UInt64
2ts	DateTime64(3, 'UTC')
3country	String
4action	String
5amount	Float64
6attrs	Map(String, String)

Those types decide how each value is rendered in JSON. Integers and floats become JSON numbers ("amount":6.01). Strings become JSON strings. The DateTime64(3) timestamp is written as a quoted string with millisecond precision ("2026-01-01 00:02:17.000), because JSON has no native date type. The precision shown matches the column's declared scale.

Nested columns keep their structure #

This is the part upload-based converters get wrong. Parquet columns are often nested — a Map, a Tuple, a list. A flat target like CSV would have to stringify them or drop them. JSONL is hierarchical, so the structure carries straight through. The attrs column above is a Map(String, String), and it lands as a nested JSON object, not an escaped string:

1clickhouse local -q "SELECT attrs FROM file('events.jsonl') LIMIT 3 FORMAT JSONEachRow"
1{"attrs":{"os":"ios","version":"1"}}
2{"attrs":{"os":"android","version":"2"}}
3{"attrs":{"os":"web","version":"3"}}

{"os":"ios","version":"1"} is real JSON you can index into downstream, not "{'os': 'ios'}" baked into a string. Arrays in Parquet become JSON arrays the same way. This is the main reason to pick JSONL over a flat format when the source has any nesting.

Filter and reshape on the way out #

The conversion is a SELECT, so the full SQL surface is available before you write. Project a subset of columns, filter rows, rename, compute — the output is whatever the query returns:

1clickhouse local -q "
2SELECT event_id, country, amount
3FROM file('events.parquet')
4WHERE action = 'purchase'
5ORDER BY amount DESC
6INTO OUTFILE 'purchases.jsonl' FORMAT JSONEachRow"
1{"event_id":19,"country":"GB","amount":23.18}
2{"event_id":15,"country":"DE","amount":19.14}
3{"event_id":11,"country":"IN","amount":15.1}

An online converter gives you all rows and all columns and nothing else. Here the conversion and the transform are one pass.

Options worth knowing #

A few things the one-liner doesn't show that matter in practice:

  • Compress on the way out. Give the output a .gz, .zst, .lz4, or .xz suffix and clickhouse-local compresses transparently — no extra flag. events.jsonl (2,605 bytes) became events.jsonl.gz (547 bytes) just by renaming the target. Reading it back is equally transparent: file('events.jsonl.gz') decompresses on the fly.
  • TRUNCATE to overwrite. INTO OUTFILE refuses to clobber an existing file by default. Add TRUNCATE (as the runnable scripts do) to make re-runs idempotent.
  • Pick the right object-per-line variant. JSONEachRow is standard JSONL. JSONStringsEachRow renders every value as a string if a downstream parser is strict about that. JSONCompactEachRow writes each row as a JSON array (values only, no keys) — smaller, but not the object-per-line shape most tools expect from JSONL.
  • It streams. The file is read and written in a stream, so peak memory stays flat regardless of file size. That's what lets a single laptop convert a Parquet file bigger than its RAM.

Reverse direction? See how to convert JSONL to Parquet to go back the other way.

How fast is it? #

On a 3,000,000-row events_large.parquet, the full conversion to a ~409 MB events_large.jsonl runs in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.jsonl' TRUNCATE FORMAT JSONEachRow"

~0.28 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That is reading the columnar Parquet, serializing 3M JSON objects, and writing 409 MB to disk. No upload, no server round-trip.

In Python with chDB #

If you'd rather stay in Python, chDB is the same ClickHouse engine in-process. The conversion is the identical SQL. Run import chdb and call it:

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('data/events.parquet') "
5    "INTO OUTFILE 'data/events.jsonl' TRUNCATE FORMAT JSONEachRow"
6)
7
8with open("data/events.jsonl") as f:
9    for line in f.read().splitlines()[:3]:
10        print(line)
1{"event_id":1,"ts":"2026-01-01 00:00:00.000","country":"GB","action":"click","amount":5,"attrs":{"os":"ios","version":"1"}}
2{"event_id":2,"ts":"2026-01-01 00:02:17.000","country":"US","action":"view","amount":6.01,"attrs":{"os":"android","version":"2"}}
3{"event_id":3,"ts":"2026-01-01 00:04:34.000","country":"DE","action":"purchase","amount":7.02,"attrs":{"os":"web","version":"3"}}

Same nested-object handling, same types, no server to start. For reading the resulting JSONL back into a DataFrame, see how to read a JSONL file in Python with chDB.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-jsonl

The same SQL that converts a file on your laptop runs unchanged against a Parquet file on S3, against a ClickHouse server, or against ClickHouse Cloud when the data outgrows one machine. Related: query a Parquet file, what is a Parquet file, and what is NDJSON.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...