To query an NDJSON file with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then query the file directly:
clickhouse local -q "SELECT * FROM file('events.ndjson') LIMIT 5"
┌─event_id─┬──────────event_time─┬─country─┬─event_type─┬─amount─┬─quantity─┐
1. │ 1 │ 2026-06-01 00:00:00 │ GB │ click │ 5 │ 1 │
2. │ 2 │ 2026-06-01 00:02:17 │ US │ view │ 6.01 │ 2 │
3. │ 3 │ 2026-06-01 00:04:34 │ DE │ purchase │ 7.02 │ 3 │
4. │ 4 │ 2026-06-01 00:06:51 │ FR │ refund │ 8.03 │ 4 │
5. │ 5 │ 2026-06-01 00:09:08 │ IN │ click │ 9.04 │ 5 │
└──────────┴─────────────────────┴─────────┴────────────┴────────┴──────────┘
clickhouse local reads the .ndjson extension as the JSONEachRow format, infers a column per JSON key, and answers the query in place with no import step first.
Prefer Python? See How to read an NDJSON file in Python and query it with SQL for the same queries against a pandas DataFrame.
NDJSON, JSONL, and JSON Lines are the same format
These three names describe one thing: one JSON object per line, separated by newlines, with no surrounding [ ... ] array and no commas between records. It is the standard shape for logs, event streams, and API exports because a writer can append a line at a time and a reader can process a line at a time without parsing the whole file. ClickHouse calls this format JSONEachRow.
The file extension you meet in the wild varies (.ndjson, .jsonl, sometimes a line-delimited .json), but the contents are identical, and so is the query. If you specifically have a .jsonl file, the same steps apply: see query a JSON Lines file and query a JSONL file. For a single top-level JSON array instead of line-delimited objects, see run SQL on a JSON file.
You never write CREATE TABLE. ClickHouse reads a sample of the lines, takes the JSON keys as column names, and infers each column's type from the values. DESCRIBE prints what it found:
clickhouse local -q "DESCRIBE file('events.ndjson') FORMAT PrettyCompact"
┌─name───────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ event_id │ Nullable(Int64) │ │ │ │ │ │
2. │ event_time │ Nullable(DateTime) │ │ │ │ │ │
3. │ country │ Nullable(String) │ │ │ │ │ │
4. │ event_type │ Nullable(String) │ │ │ │ │ │
5. │ amount │ Nullable(Float64) │ │ │ │ │ │
6. │ quantity │ Nullable(Int64) │ │ │ │ │ │
└────────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
The keys gave the names; the values gave the types. Columns are Nullable because a key can be missing on some lines. When inference guesses wrong, pass the format and an explicit schema as the second and third arguments to file(), for example file('events.ndjson', JSONEachRow, 'event_id UInt32, amount Float64').
If your file has the .jsonl extension (or any other), name the format yourself so ClickHouse does not have to guess from the suffix:
clickhouse local -q "SELECT count() FROM file('events.jsonl', JSONEachRow)"
Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:
clickhouse local -q "
SELECT country, count() AS events, round(sum(amount), 2) AS amount, round(avg(quantity), 2) AS avg_qty
FROM file('events.ndjson')
GROUP BY country
ORDER BY amount DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬─amount─┬─avg_qty─┐
1. │ US │ 4 │ 60.4 │ 3.5 │
2. │ GB │ 4 │ 56.36 │ 2.5 │
3. │ AU │ 3 │ 48.33 │ 2 │
4. │ IN │ 3 │ 45.3 │ 2.67 │
5. │ FR │ 3 │ 42.27 │ 3.33 │
6. │ DE │ 3 │ 39.24 │ 4 │
└─────────┴────────┴────────┴─────────┘
clickhouse local runs the same SQL unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.
If your objects have nested fields (an address object, a tags array), ClickHouse infers those as Tuple and Array columns and lets you reach into them with dot access and array functions. See query nested JSON with SQL for that.
You do not unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so an .ndjson.gz file is queried exactly like an .ndjson:
clickhouse local -q "SELECT country, count() FROM file('events.ndjson.gz') GROUP BY country ORDER BY country FORMAT PrettyCompact"
┌─country─┬─count()─┐
1. │ AU │ 3 │
2. │ DE │ 3 │
3. │ FR │ 3 │
4. │ GB │ 4 │
5. │ IN │ 3 │
6. │ US │ 4 │
└─────────┴─────────┘
The same applies to .ndjson.zst, .ndjson.xz, and other supported codecs. The compression is inferred from the file name.
NDJSON is convenient to write but slow to scan repeatedly: every query re-parses text and re-infers types. If you will query the data more than once, convert it to Parquet once and read that instead. SELECT from the NDJSON, INTO OUTFILE as Parquet:
clickhouse local -q "SELECT * FROM file('events.ndjson') INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
The result is a real Parquet file with the columns and types ClickHouse inferred. Read it back with the same file() call. See how to query a Parquet file and the convert NDJSON to Parquet guide.
Small files are instant in anything; the difference shows up at scale. On a 3,000,000-row, ~360 MB NDJSON file (events_large.ndjson), a filter-and-group-by that parses every line from scratch runs in:
clickhouse local --time -q "
SELECT country, count() AS events, round(sum(amount),2) AS amount, round(avg(quantity),3) AS avg_qty
FROM file('events_large.ndjson')
WHERE event_type = 'purchase'
GROUP BY country ORDER BY amount DESC
FORMAT Null"
~0.53 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON text on every run; there is no cached table. The number may move a little under concurrent load. Convert the same data to Parquet and the scan gets faster again, because Parquet is columnar and typed.
The query you ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.ndjson') for a table name and the rest stays put. Prototype against a file on your machine, ship the identical logic to production.
The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, and the ~360 MB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-ndjson
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-ndjson
./generate.sh && ./run.sh