To convert an NDJSON file to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the NDJSON and write it out as Parquet in one command:
clickhouse local -q "SELECT * FROM file('events.ndjson') INTO OUTFILE 'events.parquet' FORMAT Parquet"
The file is read in place with no import step. ClickHouse infers the schema from the NDJSON, writes a typed Parquet file, and streams rows so inputs larger than RAM are handled without loading everything into memory first.
NDJSON and JSONL are the same thing: one JSON object per line, sometimes with a .ndjson extension and sometimes .jsonl. ClickHouse reads both with the JSONEachRow format, and the file extension .ndjson is enough to pick it automatically. So the command above is all you need.
Real NDJSON usually has nested objects and arrays, not just flat scalars. That is where a flat target like CSV loses information. Parquet keeps the structure, and so does this conversion. Take a line like:
{"event_id":1,"ts":"2026-01-01 00:00:00","event_type":"login","user":{"country":"GB","plan":"free"},"items":[0],"amount":5}
clickhouse-local infers the column types straight from the data. Check what it found with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.ndjson')"
event_id Nullable(Int64)
ts Nullable(DateTime)
event_type Nullable(String)
user Tuple(country Nullable(String), plan Nullable(String))
items Array(Nullable(Int64))
amount Nullable(Float64)
The nested user object became a Tuple and the items array became an Array. Those are first-class Parquet types, so they survive the conversion. Describe the Parquet file you just wrote and you get the same shape back:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_id Nullable(Int64)
ts Nullable(DateTime64(3, 'UTC'))
event_type Nullable(String)
user Tuple(country Nullable(String), plan Nullable(String))
items Array(Nullable(Int64))
amount Nullable(Float64)
Read the nested columns back out with dotted paths, no re-parsing of JSON text:
clickhouse local -q "SELECT event_id, user.country, user.plan, items FROM file('events.parquet') ORDER BY event_id LIMIT 5"
1 GB free [0]
2 US pro [1,2]
3 DE team [2,3,4]
4 FR free [3]
5 GB pro [4,5]
One cast to be aware of: the ts timestamp comes in from JSON as DateTime (second precision) and lands in Parquet as DateTime64(3), because Parquet's native timestamp is millisecond-precision. The values are identical; only the declared type widens. If you need exact control over a column's type, pass an explicit schema to file() as the second and third arguments before the conversion.
An upload-based online converter gives you one fixed output. Doing it locally means you control the file. The two settings you will actually reach for:
Compression codec. Parquet defaults to ZSTD here. Override it per write:
clickhouse local -q "
SELECT * FROM file('events.ndjson')
INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet
SETTINGS output_format_parquet_compression_method='snappy'"
ZSTD packs smaller; Snappy decodes a touch faster and is the safe default for older readers. On the 1,000,000-row file below, ZSTD produced a 16 MB Parquet and Snappy 21 MB, both from 140 MB of NDJSON text.
Row group size. output_format_parquet_row_group_size (default 1,000,000 rows) controls how many rows go in each row group. Smaller groups give readers finer-grained row-group pruning; larger groups compress slightly better. Leave it alone unless a downstream reader asks for a specific size.
You can also inspect what was written. The ParquetMetadata format reads the footer:
clickhouse local -q "
SELECT num_columns, num_rows, num_row_groups, columns.name, columns.compression
FROM file('events.parquet', ParquetMetadata) FORMAT Vertical"
num_columns: 7
num_rows: 20
num_row_groups: 1
columns.name: ['event_id','ts','event_type','country','plan','element','amount']
columns.compression: ['ZSTD','ZSTD','ZSTD','ZSTD','ZSTD','ZSTD','ZSTD']
Note the seven physical columns from six logical ones: the user tuple is stored as two leaf columns (country, plan) and the array as an element column. That is how Parquet lays out nested types on disk, and it is exactly what makes columnar reads of user.country cheap later.
How fast is it, and how small?
The conversion streams: rows are read from the NDJSON and written to Parquet without materialising the whole file, so it works on inputs larger than RAM.
clickhouse local -q "SELECT * FROM file('events_large.ndjson') INTO OUTFILE 'events_large.parquet' TRUNCATE FORMAT Parquet"
Converting events_large.ndjson (1,000,000 rows, ~140 MB) to Parquet takes ~0.47 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The output is 16 MB, about a 9x reduction, because Parquet is columnar, typed, and ZSTD-compressed rather than repeated JSON keys in plain text. The next tool that reads it scans far less data than it would from the NDJSON.
Reverse direction? To go back the other way, see convert Parquet to NDJSON.
If you would rather stay in Python, chDB is the same ClickHouse engine in-process. The SQL is identical:
import chdb
chdb.query("""
SELECT * FROM file('events.ndjson')
INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet
""")
# read the nested columns straight back as a pandas DataFrame
df = chdb.query("""
SELECT event_id, user.country AS country, user.plan AS plan, items
FROM file('events.parquet') ORDER BY event_id LIMIT 5
""", "DataFrame")
print(df)
event_id country plan items
0 1 GB free [0]
1 2 US pro [1, 2]
2 3 DE team [2, 3, 4]
3 4 FR free [3]
4 5 GB pro [4, 5]
No server, no extra dependency to write Parquet — chDB handles the read and the write. pyarrow can also write Parquet from a DataFrame, and is a fair choice when your data already lives in pandas; the advantage here is that one engine infers the NDJSON schema, preserves the nesting, and writes the Parquet in a single step.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample NDJSON (including the ~140 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-ndjson-to-parquet
The same SQL scales from this file on your laptop to a ClickHouse server or ClickHouse Cloud when the data outgrows it, with no rewrite. Related: how to query a Parquet file, what is Parquet, and the JSONL to Parquet version of this guide.