To convert JSONL to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the JSONL and write it to Parquet in one pass:
clickhouse local -q "SELECT * FROM file('events.jsonl') INTO OUTFILE 'events.parquet' FORMAT Parquet"
Verify the schema of the Parquet file it created:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_id Nullable(Int64)
ts Nullable(DateTime64(3, 'UTC'))
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
is_member Nullable(Int64)
device Tuple(os Nullable(String), ver Nullable(String))
clickhouse local infers a type for every column from the JSON values and carries those types into the Parquet schema with no import step. The timestamp string became a real DateTime64, and the nested device object became a Parquet struct. The file is read and converted in place, with no upload or staging required.
clickhouse local reads JSONL with the JSONEachRow format: one JSON object per line. It scans the values to infer a type per column, then carries those types into the Parquet schema. SELECT * is the data, INTO OUTFILE picks the destination, and FORMAT Parquet is the only format hint you need.
Look at what was inferred from the JSONL:
clickhouse local -q "DESCRIBE file('events.jsonl')"
event_id Nullable(Int64)
ts Nullable(DateTime)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
is_member Nullable(Int64)
device Tuple(os Nullable(String), ver Nullable(String))
Now the Parquet file's schema:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_id Nullable(Int64)
ts Nullable(DateTime64(3, 'UTC'))
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
is_member Nullable(Int64)
device Tuple(os Nullable(String), ver Nullable(String))
Two things to notice. The timestamp string became a DateTime64 written as a proper Parquet timestamp, not text. And the nested device object survived: it is a Tuple on the ClickHouse side, which Parquet stores as a native struct. You do not have to flatten or stringify nested JSON to get it into Parquet.
This is the part online converters tend to mangle. A JSONL line like {"device":{"os":"ios","ver":"1"}} keeps its shape in Parquet. Read the struct fields back with dotted paths:
clickhouse local -q "
SELECT event_id, device.os AS os, device.ver AS ver, amount
FROM file('events.parquet')
ORDER BY event_id LIMIT 5 FORMAT PrettyCompactMonoBlock"
┌─event_id─┬─os──────┬─ver─┬─amount─┐
1. │ 1 │ ios │ 1 │ 5 │
2. │ 2 │ android │ 2 │ 6.01 │
3. │ 3 │ web │ 3 │ 7.02 │
4. │ 4 │ ios │ 4 │ 8.03 │
5. │ 5 │ android │ 5 │ 9.04 │
└──────────┴─────────┴─────┴────────┘
If your objects are deeply or irregularly nested, see how to query nested JSON with SQL for ways to project them before writing Parquet.
These are the knobs you give up when you paste a file into a browser converter.
Pick the compression codec. Parquet defaults to snappy (fast). For smaller files use zstd:
clickhouse local -q "
SELECT * FROM file('events.jsonl')
INTO OUTFILE 'events.parquet' TRUNCATE
FORMAT Parquet
SETTINGS output_format_parquet_compression_method = 'zstd'"
Tune the row group size. output_format_parquet_row_group_size (default 1,000,000) controls how many rows go in each row group, which affects how much a reader can skip.
Pin the types. If inference guesses wrong (an ID that should stay a string, say), pass the format and an explicit schema to file() as the second and third arguments, then write that out:
clickhouse local -q "
SELECT * FROM file('events.jsonl', 'JSONEachRow',
'event_id String, ts DateTime, country String, action String, amount Decimal(10,2), is_member Bool, device Tuple(os String, ver String)')
INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
Inspect the result. The ParquetMetadata format reads the footer without scanning the data, so you can confirm row groups and compression:
clickhouse local -q "
SELECT num_rows, num_row_groups, total_compressed_size, total_uncompressed_size
FROM file('events.parquet', ParquetMetadata) FORMAT Vertical"
Row 1:
──────
num_rows: 1000000
num_row_groups: 1
total_compressed_size: 9599064
total_uncompressed_size: 19915486
Size and speed
Typed columnar Parquet is dramatically smaller than line-delimited JSON, because column values compress together and field names are stored once in the schema instead of on every row. The 1,000,000-row sample goes from 137 MB of JSONL to 13.9 MB of Parquet, roughly a 10x reduction, before you even pick a heavier codec.
The conversion itself:
clickhouse local -q "SELECT * FROM file('events_large.jsonl') INTO OUTFILE 'events_large.parquet' TRUNCATE FORMAT Parquet"
~0.50 seconds for the full 1,000,000-row, ~137 MB file, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That parses every JSON line from scratch; there is no cached table. clickhouse-local runs the exact same SQL unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop.
Reverse direction? See how to convert Parquet to JSONL.
chDB is the same engine embedded in Python, so the conversion is the identical SQL:
import chdb
chdb.query(
"SELECT * FROM file('events.jsonl') "
"INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
)
# Verify the result back in-process
print(chdb.query("SELECT count() FROM file('events.parquet')", "CSV"))
The schema, the nested struct, and the timestamp type come out the same as the CLI. Install with pip install chdb.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample JSONL (including the ~137 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-jsonl-to-parquet