How to query a JSON Lines file

To query a JSON Lines file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then query the file directly:

1clickhouse local -q "SELECT * FROM file('events.jsonl') LIMIT 5"

12026-01-01 00:00:00	1	GB	click	5	1
22026-01-01 01:00:00	2	US	view	6.01	2
32026-01-01 02:00:00	3	DE	purchase	7.02	3
42026-01-01 03:00:00	4	FR	refund	8.03	4
52026-01-01 04:00:00	5	IN	click	9.04	5

The .jsonl extension maps to ClickHouse's JSONEachRow format, which reads each line's keys as column names and infers the types from the values. The file is queried in place with no import step.

Prefer Python? See How to read a JSONL file in Python and query it with SQL for the same queries against a pandas DataFrame.

See the schema without declaring one

Each line is a self-describing JSON object, so you never write CREATE TABLE. DESCRIBE prints the column names taken from the keys and the types ClickHouse inferred from the values:

1clickhouse local -q "DESCRIBE file('events.jsonl')"

1event_time	Nullable(DateTime)
2user_id	Nullable(Int64)
3country	Nullable(String)
4event_type	Nullable(String)
5revenue	Nullable(Float64)
6quantity	Nullable(Int64)

The keys gave the names; the values gave the types. Columns are Nullable because inference allows for missing keys: if a line omits a field, that row reads back as null instead of failing the parse.

Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, aggregate functions, window functions, joins) with no load step in between:

1clickhouse local -q "
2SELECT
3  country,
4  count()                 AS events,
5  round(sum(revenue), 2)  AS revenue,
6  round(avg(quantity), 2) AS avg_qty
7FROM file('events.jsonl')
8GROUP BY country
9ORDER BY revenue DESC"

1US	4	60.4	3.5
2GB	4	56.36	2.5
3AU	3	48.33	2
4IN	3	45.3	2.67
5FR	3	42.27	3.33
6DE	3	39.24	4

This is the everyday workflow: open a terminal, run one query against the file, read the answer.

JSONL, NDJSON, and JSON Lines are the same format

These three names describe the same thing: one JSON object per line, newline-delimited. JSONL is the common file extension, NDJSON ("newline-delimited JSON") is the spec name, and "JSON Lines" is the human-readable label. ClickHouse reads all of them with JSONEachRow, and the .ndjson extension maps to it exactly like .jsonl:

1clickhouse local -q "SELECT count() FROM file('events.ndjson')"

So if your tool emits .ndjson, query it the same way. (See what is NDJSON for the format background, and the NDJSON and JSONL versions of this guide, which are the same recipe under their own names.)

This is the one place JSON Lines trips people up versus a plain .json file. A .json file usually holds a single JSON value: one big array or one object spanning many lines. JSON Lines is the opposite: many independent objects, one per line, and no enclosing brackets or commas between them. That difference is why JSON Lines streams and appends cleanly (you can tail -f it or cat two files together) while a single JSON array does not. If your data is actually one array or one nested document, see run SQL on a JSON file and query nested JSON with SQL instead.

Name the format for odd extensions

The .jsonl and .ndjson extensions are auto-detected. If your file ends in something else (.txt, .log, no extension at all) ClickHouse can't guess, so name the format as the second argument to file():

1clickhouse local -q "SELECT count() FROM file('events.txt', JSONEachRow)"

JSONEachRow also reads a top-level array

JSONEachRow will also parse a file that is one top-level JSON array, treating each element as a row. So an API response saved as [{...},{...}] queries the same way, no reshaping first:

1clickhouse local -q "SELECT * FROM file('array.json', JSONEachRow)"

11	GB
22	US

One format reads both the line-delimited form and the single-array form, which covers most JSON you'll be handed.

Gzipped JSON Lines work too

You don't need to unzip anything first. clickhouse local detects the .gz extension and decompresses on the fly, so a .jsonl.gz file is queried exactly like a .jsonl:

1clickhouse local -q "SELECT country, count() FROM file('events.jsonl.gz') GROUP BY country ORDER BY country"

1AU	3
2DE	3
3FR	3
4GB	4
5IN	3
6US	4

The same applies to .jsonl.zst, .jsonl.xz, and other supported codecs. The compression is inferred from the file name, which matters for JSON Lines specifically because it is verbose text and compresses well, so log dumps usually arrive gzipped.

How fast is it?

Small files are instant in anything. The difference shows up at scale. On a 3,000,000-row, ~360 MB JSON Lines file (events_large.jsonl), a full GROUP BY country with sum and avg over every row completes in:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(revenue), 2) AS revenue, round(avg(quantity), 3) AS avg_qty
3FROM file('events_large.jsonl')
4GROUP BY country
5ORDER BY revenue DESC"

~0.52 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON text from scratch on every run; there's no cached table. JSON Lines is heavier to parse than a columnar format, so if you'll query the same data repeatedly, convert it once to Parquet with convert JSONL to Parquet and the scans get faster again.

clickhouse local runs the same SQL unchanged across dozens of file formats and remote sources.

The same SQL scales unchanged

The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.jsonl') for a table name and the rest stays put. You prototype against a file on your machine and ship the identical logic to production.

Run it yourself

The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, and the ~360 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-json-lines

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-json-lines
3./generate.sh && ./run.sh

See the schema without declaring one

Filter, group, and aggregate directly on the file

JSONL, NDJSON, and JSON Lines are the same format

Name the format for odd extensions

JSONEachRow also reads a top-level array

Gzipped JSON Lines work too

How fast is it?

The same SQL scales unchanged

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to query a JSON Lines file

See the schema without declaring one #

Filter, group, and aggregate directly on the file #

JSONL, NDJSON, and JSON Lines are the same format #

Name the format for odd extensions #

JSONEachRow also reads a top-level array #

Gzipped JSON Lines work too #

How fast is it? #

The same SQL scales unchanged #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

See the schema without declaring one

Filter, group, and aggregate directly on the file

JSONL, NDJSON, and JSON Lines are the same format

Name the format for odd extensions

JSONEachRow also reads a top-level array

Gzipped JSON Lines work too

How fast is it?

The same SQL scales unchanged

Run it yourself

Related