How to run SQL on a JSONL file

To run SQL on a JSONL file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file with the file() function:

1clickhouse local -q "SELECT * FROM file('events.jsonl') LIMIT 5"

1┌─event_date─┬─event_id─┬─country─┬─event_type─┬─revenue─┬─quantity─┐
21. │ 2026-01-01 │        1 │ GB      │ click      │       5 │        1 │
32. │ 2026-01-02 │        2 │ US      │ view       │    6.01 │        2 │
43. │ 2026-01-03 │        3 │ DE      │ purchase   │    7.02 │        3 │
54. │ 2026-01-04 │        4 │ FR      │ refund     │    8.03 │        4 │
65. │ 2026-01-05 │        5 │ IN      │ click      │    9.04 │        5 │
7   └────────────┴──────────┴─────────┴────────────┴─────────┴──────────┘

A .jsonl file holds one JSON object per line; ClickHouse reads it with the JSONEachRow format, turns each key into a column, and infers the types from the values. The file is read in place with no import step.

Prefer Python? See How to read a JSONL file in Python and query it with SQL for the same queries against a pandas DataFrame.

.jsonl, .ndjson, JSON Lines: same format, different names

JSONL is one JSON object per line, no enclosing array and no commas between objects:

1{"event_date":"2026-01-01","event_id":1,"country":"GB","event_type":"click","revenue":5,"quantity":1}
2{"event_date":"2026-01-02","event_id":2,"country":"US","event_type":"view","revenue":6.01,"quantity":2}

That layout goes by several names. JSONL, NDJSON (newline-delimited JSON), and JSON Lines are the same thing, and you will see the extensions .jsonl, .ndjson, and sometimes a line-delimited .json. ClickHouse reads all of them with one format, JSONEachRow, and detects it from the .jsonl and .ndjson extensions automatically. The only time you name the format yourself is when the extension does not give it away (for example a line-delimited file ending in .json, which would otherwise be read as a single JSON document):

1clickhouse local -q "SELECT count() FROM file('events.json', JSONEachRow)"

If your data is instead a single top-level JSON array or a pretty-printed document, that is a different shape — see how to run SQL on a JSON file.

See the schema without declaring one

You never write CREATE TABLE. ClickHouse scans the objects, collects the keys, and infers a type per column. DESCRIBE shows what it found:

1clickhouse local -q "DESCRIBE file('events.jsonl') FORMAT PrettyCompact"

1┌─name───────┬─type──────────────┐
21. │ event_date │ Nullable(Date)    │
32. │ event_id   │ Nullable(Int64)   │
43. │ country    │ Nullable(String)  │
54. │ event_type │ Nullable(String)  │
65. │ revenue    │ Nullable(Float64) │
76. │ quantity   │ Nullable(Int64)   │
8   └────────────┴───────────────────┘

The keys gave the names; the values gave the types. Columns are Nullable because a JSONL stream can omit a key on any line, and a missing key reads back as NULL. If inference guesses a type you don't want, pass an explicit structure as the third argument: file('events.jsonl', 'JSONEachRow', 'event_id UInt32, country String, ...').

Filter, aggregate, and group by

Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, aggregate and window functions, joins) with no load step in between:

1clickhouse local -q "
2SELECT country,
3       count()                 AS purchases,
4       round(sum(revenue), 2)  AS revenue,
5       round(avg(quantity), 3) AS avg_qty
6FROM file('events.jsonl')
7WHERE event_type = 'purchase'
8GROUP BY country
9ORDER BY revenue DESC
10FORMAT PrettyCompact"

1┌─country─┬─purchases─┬─revenue─┬─avg_qty─┐
21. │ GB      │         2 │   34.24 │       3 │
32. │ DE      │         2 │   26.16 │       4 │
43. │ IN      │         1 │    15.1 │       1 │
5   └─────────┴───────────┴─────────┴─────────┘

The same SQL runs unchanged across dozens of formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

Gzipped JSONL works too

You don't unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so a .jsonl.gz file is queried exactly like a .jsonl:

1clickhouse local -q "SELECT country, count() FROM file('events.jsonl.gz') GROUP BY country ORDER BY country FORMAT PrettyCompact"

1┌─country─┬─count()─┐
21. │ AU      │       3 │
32. │ DE      │       3 │
43. │ FR      │       3 │
54. │ GB      │       4 │
65. │ IN      │       3 │
76. │ US      │       4 │
8   └─────────┴─────────┘

The same applies to .jsonl.zst, .jsonl.xz, and other supported codecs. The compression is read from the file name. (Query a compressed file.)

Convert JSONL to Parquet in one line

JSONL is convenient to produce and stream, but slow to scan repeatedly: every query re-parses text and re-infers types. If you'll query the data more than once, convert it to Parquet once and read that instead — SELECT from the JSONL, INTO OUTFILE as Parquet:

1clickhouse local -q "SELECT * FROM file('events.jsonl') INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"

The result is a real Parquet file with the columns and types ClickHouse inferred. Read it back with the same file() call. See how to query a Parquet file and the dedicated convert JSONL to Parquet guide for the typing and compression options.

How fast is it?

Small files are instant in anything; the difference shows up at scale. On a 3,000,000-row, ~342 MB JSONL file (events_large.jsonl), a filter-and-group-by over every line, parsing the JSON text from scratch on each run, completes in:

1clickhouse local -q "
2SELECT country, count(), round(sum(revenue), 2) AS revenue, round(avg(quantity), 3)
3FROM file('events_large.jsonl')
4WHERE event_type = 'purchase'
5GROUP BY country ORDER BY revenue DESC"

~0.54 seconds, best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing every JSON object from scratch; there's no cached table. The number may move a little under concurrent load. Convert the same data to Parquet and the scan gets faster again, because Parquet is columnar and typed.

The same SQL scales unchanged

The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.jsonl') for a table name and the rest stays put. You prototype against a file on your machine and ship the identical logic to production.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the demo file, a gzipped copy, and the ~342 MB perf file), run.sh (every command on this page), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-jsonl

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-jsonl
3./generate.sh && ./run.sh

Prefer Python? → Read a JSONL file in Python

.jsonl, .ndjson, JSON Lines: same format, different names

See the schema without declaring one

Filter, aggregate, and group by

Gzipped JSONL works too

Convert JSONL to Parquet in one line

How fast is it?

The same SQL scales unchanged

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to run SQL on a JSONL file

.jsonl, .ndjson, JSON Lines: same format, different names #

See the schema without declaring one #

Filter, aggregate, and group by #

Gzipped JSONL works too #

Convert JSONL to Parquet in one line #

How fast is it? #

The same SQL scales unchanged #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

.jsonl, .ndjson, JSON Lines: same format, different names

See the schema without declaring one

Filter, aggregate, and group by

Gzipped JSONL works too

Convert JSONL to Parquet in one line

How fast is it?

The same SQL scales unchanged

Run it yourself

Related