How to convert Parquet to JSON

Al Brown
Last updated: Jun 8, 2026

To convert Parquet to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the Parquet file to JSON Lines with INTO OUTFILE:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
1{"event_id":1,"event_date":"2026-01-01","event_ts":"2026-01-01 00:00:00.000","country":"GB","amount":5,"tags":["new","paid"],"account":{"name":"acme","tier":1}}
2{"event_id":2,"event_date":"2026-01-02","event_ts":"2026-01-01 01:00:00.000","country":"US","amount":6.01,"tags":["web"],"account":{"name":"globex","tier":2}}
3{"event_id":3,"event_date":"2026-01-03","event_ts":"2026-01-01 02:00:00.000","country":"DE","amount":7.02,"tags":["api","mobile"],"account":{"name":"initech","tier":3}}

The schema is read from the Parquet footer, so you declare no columns and the file converts in place with no import step.

JSONEachRow vs a JSON array #

JSON has two shapes, and which one you want decides the FORMAT.

JSONEachRow writes one JSON object per line. This is NDJSON, also called JSON Lines: a stream you can grep, split, append to, and feed line by line into tools without loading the whole file. It's the right default for data.

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"

FORMAT JSON writes a single document: one array of rows wrapped in an object, with a meta block of column names and types and a statistics block. Use it when an API or a frontend expects one parseable array.

1clickhouse local -q "SELECT event_id, country, amount, tags, account FROM file('events.parquet') ORDER BY event_id LIMIT 2 INTO OUTFILE 'events.json' FORMAT JSON"
1{
2	"meta":
3	[
4		{ "name": "event_id", "type": "UInt64" },
5		{ "name": "country", "type": "String" },
6		{ "name": "amount", "type": "Decimal(18, 2)" },
7		{ "name": "tags", "type": "Array(String)" },
8		{ "name": "account", "type": "Tuple(name String, tier UInt8)" }
9	],
10
11	"data":
12	[
13		{
14			"event_id": 1,
15			"country": "GB",
16			"amount": 5,
17			"tags": ["new","paid"],
18			"account": { "name": "acme", "tier": 1 }
19		},
20		...
21	],
22
23	"rows": 2,
24	...
25}

If you don't know which one you want, pick JSONEachRow. It's smaller, streamable, and round-trips cleanly back to Parquet later.

Typed and nested columns carry through #

This is where converting through Parquet's schema beats a generic text dump. The footer carries real types, and clickhouse-local maps each one to its natural JSON representation instead of stringifying everything. Check what it read with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.parquet')"
1event_id     UInt64
2event_date   Date32
3event_ts     DateTime64(3, 'UTC')
4country      String
5amount       Decimal(18, 2)
6tags         Array(String)
7account      Tuple(name String, tier UInt8)

Look at how those land in the output above. A Date becomes an ISO "2026-01-01" string. An Array(String) becomes a JSON array ["new","paid"], not a quoted blob. A named Tuple becomes a nested JSON object {"name":"acme","tier":1} with real keys. Nesting in, nesting out.

One number to watch: a 64-bit integer is a valid JSON number, but some parsers (JavaScript among them) lose precision above 2^53. ClickHouse can quote them as strings so they survive:

1clickhouse local -q "SELECT event_id, amount FROM file('events.parquet') ORDER BY event_id LIMIT 2 FORMAT JSONEachRow SETTINGS output_format_json_quote_64bit_integers = 1"
1{"event_id":"1","amount":5}
2{"event_id":"2","amount":6.01}

Going the other direction, flat-only formats can't hold this structure. A tags array or an account object collapses to a quoted string when forced into CSV. JSON keeps the shape, which is why Parquet to JSON is usually lossless where Parquet to CSV is not.

Reshape while converting #

The conversion is a SELECT, so the full SQL surface is available mid-flight. Project columns, filter rows, aggregate, rename — the output is whatever the query returns.

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount), 2) AS total
3FROM file('events.parquet')
4GROUP BY country
5ORDER BY total DESC
6FORMAT JSONEachRow"
1{"country":"IN","events":4,"total":66.45}
2{"country":"FR","events":4,"total":62.41}
3{"country":"DE","events":4,"total":58.38}
4{"country":"US","events":4,"total":54.34}
5{"country":"GB","events":4,"total":50.29}

That's the information gain over an upload-required online converter: no upload of your data to someone else's server, real type fidelity from the Parquet footer, it's scriptable in a pipeline, and it handles files larger than memory because the read streams. clickhouse-local runs the same SQL unchanged across dozens of formats and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

In Python with chDB #

chDB is the same engine embedded in Python. Same SQL, same FORMAT JSONEachRow, no server. chdb.query returns the bytes; write them to a file:

1import chdb
2
3out = chdb.query(
4    "SELECT * FROM file('events.parquet') ORDER BY event_id FORMAT JSONEachRow"
5)
6with open("events.jsonl", "w") as f:
7    f.write(str(out))

The output is byte-for-byte the same as the CLI, including the nested tags array and the account object.

How fast is it? #

On a ~3,000,000-row Parquet file (events_large.parquet, ~46 MB), converting the whole thing to JSON Lines (a ~486 MB .jsonl) completes in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.jsonl' FORMAT JSONEachRow"

~0.27 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number can wobble a little under concurrent load; the read decodes every Parquet row group and serializes 3M JSON objects from scratch each run.

Reverse direction? #

Going the other way is the same idea with the formats swapped. See how to convert JSON to Parquet.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the ~46 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-json

Related: query a Parquet file with SQL, run SQL on a JSON file, and what is NDJSON.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...