To query a JSON file with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at your JSON file with the JSONEachRow format:
clickhouse local -q "SELECT * FROM file('events.jsonl', JSONEachRow) LIMIT 5"
Row 1:
──────
event_id: 1
event_type: signup
amount: 1
geo: {
"city": "London",
"country": "GB"
}
tags: ['mobile']
JSONEachRow reads one object per line directly from disk, inferring column names and types without a schema declaration or an import step. Nested objects become Tuple columns and arrays become Array columns, so you can query their members in the same SELECT.
Prefer Python? See How to read a JSON file in Python and query it with SQL for the same queries against a pandas DataFrame.
JSONL and a top-level array read the same way
JSON on disk comes in two common shapes. Line-delimited JSON (also called JSONL or NDJSON) has one object per line:
{"event_id":1,"event_type":"signup","amount":1,"geo":{"country":"GB","city":"London"},"tags":["mobile"]}
{"event_id":2,"event_type":"click","amount":2.01,"geo":{"country":"US","city":"NYC"},"tags":["web","beta"]}
The other shape is a single top-level array, [{...}, {...}, ...]. Both read with the same JSONEachRow format. ClickHouse detects the leading [ and parses the array element-by-element, so you do not load the whole file into memory and you do not need a different command:
clickhouse local -q "SELECT count() FROM file('events.json', JSONEachRow)"
JSONEachRow is the format to reach for in both cases. (ClickHouse also has a JSON format, but that one is for its own metadata-wrapped output, not for reading arbitrary array files.)
DESCRIBE shows you what ClickHouse inferred, including how it handled the nested object and the array:
clickhouse local -q "DESCRIBE file('events.jsonl', JSONEachRow) FORMAT PrettyCompact"
Row 4:
──────
name: geo
type: Tuple(
city Nullable(String),
country Nullable(String))
Row 5:
──────
name: tags
type: Array(Nullable(String))
A nested JSON object becomes a Tuple with named fields. A JSON array becomes an Array. Scalars (event_id, event_type, amount) come back as Nullable(Int64), Nullable(String) and Nullable(Float64). No CREATE TABLE, no schema file.
Because geo is a Tuple with named fields, you read its members with dot notation. There is no JSON path language to learn and no per-row parsing function to call; the field is a column:
clickhouse local -q "
SELECT event_id, geo.country AS country, geo.city AS city
FROM file('events.jsonl', JSONEachRow)
LIMIT 5
FORMAT PrettyCompact"
┌─event_id─┬─country─┬─city───┐
1. │ 1 │ GB │ London │
2. │ 2 │ US │ NYC │
3. │ 3 │ DE │ Berlin │
4. │ 4 │ FR │ Paris │
5. │ 5 │ IN │ Mumbai │
└──────────┴─────────┴────────┘
Nested fields group and aggregate like any other column:
clickhouse local -q "
SELECT geo.country AS country, count() AS events, round(sum(amount), 2) AS total
FROM file('events.jsonl', JSONEachRow)
GROUP BY country
ORDER BY total DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬─total─┐
1. │ IN │ 4 │ 50.46 │
2. │ FR │ 4 │ 46.42 │
3. │ DE │ 4 │ 42.38 │
4. │ US │ 4 │ 38.34 │
5. │ GB │ 4 │ 34.3 │
└─────────┴────────┴───────┘
For deeper or irregular nesting, see how to query nested JSON with SQL.
Explode an array column with ARRAY JOIN
The tags column holds an array per row. To count over the individual elements, unfold the array with ARRAY JOIN: it produces one output row per array element, then you aggregate as usual:
clickhouse local -q "
SELECT tag, count() AS events
FROM file('events.jsonl', JSONEachRow)
ARRAY JOIN tags AS tag
GROUP BY tag
ORDER BY events DESC
FORMAT PrettyCompact"
┌─tag────┬─events─┐
1. │ beta │ 10 │
2. │ vip │ 10 │
3. │ mobile │ 10 │
4. │ web │ 9 │
└────────┴────────┘
This is the JSON workflow that viewers and jq one-liners make awkward: a real GROUP BY over array members. ClickHouse covers the same ground with one binary that also reads CSV, Parquet, ORC, Arrow and many other formats, and runs the same SQL unchanged against a server or ClickHouse Cloud when the data outgrows your machine.
You do not unzip anything first. clickhouse local detects the .gz extension and decompresses on the fly, so a .jsonl.gz file is queried exactly like a .jsonl:
clickhouse local -q "
SELECT event_type, count()
FROM file('events.jsonl.gz', JSONEachRow)
GROUP BY event_type
ORDER BY event_type
FORMAT PrettyCompact"
┌─event_type─┬─count()─┐
1. │ click │ 5 │
2. │ purchase │ 5 │
3. │ refund │ 5 │
4. │ signup │ 5 │
└────────────┴─────────┘
The same applies to .jsonl.zst, .jsonl.xz and other supported codecs. The compression is inferred from the file name.
JSON is text, so every query parses it from scratch; there is no cached table. That parse is the work, and it runs across all cores. On a 3,000,000-row, ~314 MB JSONL file (events_large.jsonl), grouping by the nested geo.country field with a sum over every row completes in:
clickhouse local -q "
SELECT geo.country AS country, count() AS events, round(sum(amount), 2) AS total
FROM file('events_large.jsonl', JSONEachRow)
GROUP BY country
ORDER BY total DESC"
~0.66 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON text and inferring the schema on every run. If you query the same data repeatedly, convert it once to a columnar format and read that instead: see convert JSON to Parquet.
The query you just ran against a file on your laptop is the same SQL you would run on a ClickHouse server, or in ClickHouse Cloud. Nothing about SELECT ... ARRAY JOIN ... GROUP BY changes. You swap file('events.jsonl', JSONEachRow) for a table name and the rest stays put. You prototype against a file and ship the identical logic to production.
The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the sample JSONL, the top-level-array copy, a gzipped copy, and the ~314 MB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-json
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-json
./generate.sh && ./run.sh