How to query an NDJSON file with SQL

To query an NDJSON file with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then query the file directly:

1clickhouse local -q "SELECT * FROM file('events.ndjson') LIMIT 5"

1┌─event_id─┬──────────event_time─┬─country─┬─event_type─┬─amount─┬─quantity─┐
21. │        1 │ 2026-06-01 00:00:00 │ GB      │ click      │      5 │        1 │
32. │        2 │ 2026-06-01 00:02:17 │ US      │ view       │   6.01 │        2 │
43. │        3 │ 2026-06-01 00:04:34 │ DE      │ purchase   │   7.02 │        3 │
54. │        4 │ 2026-06-01 00:06:51 │ FR      │ refund     │   8.03 │        4 │
65. │        5 │ 2026-06-01 00:09:08 │ IN      │ click      │   9.04 │        5 │
7   └──────────┴─────────────────────┴─────────┴────────────┴────────┴──────────┘

clickhouse local reads the .ndjson extension as the JSONEachRow format, infers a column per JSON key, and answers the query in place with no import step first.

Prefer Python? See How to read an NDJSON file in Python and query it with SQL for the same queries against a pandas DataFrame.

NDJSON, JSONL, and JSON Lines are the same format

These three names describe one thing: one JSON object per line, separated by newlines, with no surrounding [ ... ] array and no commas between records. It is the standard shape for logs, event streams, and API exports because a writer can append a line at a time and a reader can process a line at a time without parsing the whole file. ClickHouse calls this format JSONEachRow.

The file extension you meet in the wild varies (.ndjson, .jsonl, sometimes a line-delimited .json), but the contents are identical, and so is the query. If you specifically have a .jsonl file, the same steps apply: see query a JSON Lines file and query a JSONL file. For a single top-level JSON array instead of line-delimited objects, see run SQL on a JSON file.

See the schema without declaring one

You never write CREATE TABLE. ClickHouse reads a sample of the lines, takes the JSON keys as column names, and infers each column's type from the values. DESCRIBE prints what it found:

1clickhouse local -q "DESCRIBE file('events.ndjson') FORMAT PrettyCompact"

1┌─name───────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ event_id   │ Nullable(Int64)    │              │                    │         │                  │                │
32. │ event_time │ Nullable(DateTime) │              │                    │         │                  │                │
43. │ country    │ Nullable(String)   │              │                    │         │                  │                │
54. │ event_type │ Nullable(String)   │              │                    │         │                  │                │
65. │ amount     │ Nullable(Float64)  │              │                    │         │                  │                │
76. │ quantity   │ Nullable(Int64)    │              │                    │         │                  │                │
8   └────────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

The keys gave the names; the values gave the types. Columns are Nullable because a key can be missing on some lines. When inference guesses wrong, pass the format and an explicit schema as the second and third arguments to file(), for example file('events.ndjson', JSONEachRow, 'event_id UInt32, amount Float64').

If your file has the .jsonl extension (or any other), name the format yourself so ClickHouse does not have to guess from the suffix:

1clickhouse local -q "SELECT count() FROM file('events.jsonl', JSONEachRow)"

Aggregate directly on the NDJSON

Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount), 2) AS amount, round(avg(quantity), 2) AS avg_qty
3FROM file('events.ndjson')
4GROUP BY country
5ORDER BY amount DESC
6FORMAT PrettyCompact"

1┌─country─┬─events─┬─amount─┬─avg_qty─┐
21. │ US      │      4 │   60.4 │     3.5 │
32. │ GB      │      4 │  56.36 │     2.5 │
43. │ AU      │      3 │  48.33 │       2 │
54. │ IN      │      3 │   45.3 │    2.67 │
65. │ FR      │      3 │  42.27 │    3.33 │
76. │ DE      │      3 │  39.24 │       4 │
8   └─────────┴────────┴────────┴─────────┘

clickhouse local runs the same SQL unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

If your objects have nested fields (an address object, a tags array), ClickHouse infers those as Tuple and Array columns and lets you reach into them with dot access and array functions. See query nested JSON with SQL for that.

Gzipped NDJSON works too

You do not unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so an .ndjson.gz file is queried exactly like an .ndjson:

1clickhouse local -q "SELECT country, count() FROM file('events.ndjson.gz') GROUP BY country ORDER BY country FORMAT PrettyCompact"

1┌─country─┬─count()─┐
21. │ AU      │       3 │
32. │ DE      │       3 │
43. │ FR      │       3 │
54. │ GB      │       4 │
65. │ IN      │       3 │
76. │ US      │       4 │
8   └─────────┴─────────┘

The same applies to .ndjson.zst, .ndjson.xz, and other supported codecs. The compression is inferred from the file name.

Convert NDJSON to Parquet in one line

NDJSON is convenient to write but slow to scan repeatedly: every query re-parses text and re-infers types. If you will query the data more than once, convert it to Parquet once and read that instead. SELECT from the NDJSON, INTO OUTFILE as Parquet:

1clickhouse local -q "SELECT * FROM file('events.ndjson') INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"

The result is a real Parquet file with the columns and types ClickHouse inferred. Read it back with the same file() call. See how to query a Parquet file and the convert NDJSON to Parquet guide.

How fast is it?

Small files are instant in anything; the difference shows up at scale. On a 3,000,000-row, ~360 MB NDJSON file (events_large.ndjson), a filter-and-group-by that parses every line from scratch runs in:

1clickhouse local --time -q "
2SELECT country, count() AS events, round(sum(amount),2) AS amount, round(avg(quantity),3) AS avg_qty
3FROM file('events_large.ndjson')
4WHERE event_type = 'purchase'
5GROUP BY country ORDER BY amount DESC
6FORMAT Null"

~0.53 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON text on every run; there is no cached table. The number may move a little under concurrent load. Convert the same data to Parquet and the scan gets faster again, because Parquet is columnar and typed.

The same SQL scales unchanged

The query you ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.ndjson') for a table name and the rest stays put. Prototype against a file on your machine, ship the identical logic to production.

Run it yourself

The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, and the ~360 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-ndjson

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-ndjson
3./generate.sh && ./run.sh

NDJSON, JSONL, and JSON Lines are the same format

See the schema without declaring one

Aggregate directly on the NDJSON

Gzipped NDJSON works too

Convert NDJSON to Parquet in one line

How fast is it?

The same SQL scales unchanged

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to query an NDJSON file with SQL

NDJSON, JSONL, and JSON Lines are the same format #

See the schema without declaring one #

Aggregate directly on the NDJSON #

Gzipped NDJSON works too #

Convert NDJSON to Parquet in one line #

How fast is it? #

The same SQL scales unchanged #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

NDJSON, JSONL, and JSON Lines are the same format

See the schema without declaring one

Aggregate directly on the NDJSON

Gzipped NDJSON works too

Convert NDJSON to Parquet in one line

How fast is it?

The same SQL scales unchanged

Run it yourself

Related