How to query nested JSON with SQL

Al Brown
Last updated: Jun 8, 2026

To query nested JSON with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then reach into a nested object with dot notation:

1clickhouse local -q "
2SELECT event_id, user.name, user.geo.country, user.geo.city
3FROM file('events.jsonl')
4FORMAT PrettyCompact"
1   ┌─event_id─┬─user.name─┬─user.geo.country─┬─user.geo.city─┐
21. │        1 │ Ada       │ GB               │ London        │
32. │        2 │ Lin       │ US               │ NYC           │
43. │        3 │ Sam       │ DE               │ Berlin        │
54. │        4 │ Mei       │ FR               │ Paris         │
65. │        5 │ Omar      │ IN               │ Delhi         │
7   └──────────┴───────────┴──────────────────┴───────────────┘

user.geo.country is a real column expression. The file is read in place with no import step: ClickHouse infers nested objects as typed Tuple columns, so the dot path resolves at the SQL layer and you can filter, group, and join on it like any other column.

The data: objects inside arrays inside objects #

The sample file is line-delimited JSON (JSONL). Each line nests a user object (with a geo object inside it), an items array of objects, and a free-form props object whose keys differ by event type:

1{"event_id":1,"ts":"2026-06-01T01:00:00Z","event_type":"login","user":{"id":1000,"name":"Ada","geo":{"country":"GB","city":"London"}},"items":[{"sku":"SKU-A","qty":1,"price":10.99}],"props":{"method":"sso","mfa":true}}
2{"event_id":2,"ts":"2026-06-01T01:10:11Z","event_type":"purchase","user":{"id":1001,"name":"Lin","geo":{"country":"US","city":"NYC"}},"items":[{"sku":"SKU-B","qty":1,"price":11.99},{"sku":"SKU-C","qty":2,"price":12.99}],"props":{"coupon":"SUMMER","gateway":"stripe","installments":3}}

See the nested schema without declaring one #

DESCRIBE shows how the structure was inferred. Objects become Tuples, arrays of objects become Array(Tuple(...)), and nesting is preserved all the way down:

1clickhouse local -q "DESCRIBE file('events.jsonl')"
1event_id: Nullable(Int64)
2ts: Nullable(DateTime)
3event_type: Nullable(String)
4user: Tuple(geo Tuple(city Nullable(String), country Nullable(String)), id Nullable(Int64), name Nullable(String))
5items: Array(Tuple(price Nullable(Float64), qty Nullable(Int64), sku Nullable(String)))
6props: Tuple(approved_by Nullable(String), coupon Nullable(String), gateway Nullable(String), installments Nullable(Int64), method Nullable(String), mfa Nullable(Bool), reason Nullable(String), referrer Nullable(String))

Two things to notice. user and items are typed exactly as they appear. But props has been flattened into one wide tuple holding the union of every key seen across the sampled lines, each Nullable. That is the central fact about irregular JSON, and we deal with it below.

Dot access into nested objects #

Read a nested scalar by walking the path with dots. No unnesting, no JSON functions:

1clickhouse local -q "
2SELECT event_id, user.name, user.geo.country, user.geo.city
3FROM file('events.jsonl')
4FORMAT PrettyCompact"
1   ┌─event_id─┬─user.name─┬─user.geo.country─┬─user.geo.city─┐
21. │        1 │ Ada       │ GB               │ London        │
32. │        2 │ Lin       │ US               │ NYC           │
43. │        3 │ Sam       │ DE               │ Berlin        │
54. │        4 │ Mei       │ FR               │ Paris         │
65. │        5 │ Omar      │ IN               │ Delhi         │
7   └──────────┴───────────┴──────────────────┴───────────────┘

user.geo.country is a real column expression. You can GROUP BY it, filter on it, or join on it like any other.

Explode a nested array with ARRAY JOIN #

items is an array of objects, so one event holds several line items. ARRAY JOIN turns each element into its own row, after which the elements' fields are addressable with the same dot notation:

1clickhouse local -q "
2SELECT event_id, item.sku AS sku, item.qty AS qty, item.price AS price
3FROM file('events.jsonl')
4ARRAY JOIN items AS item
5FORMAT PrettyCompact"
1   ┌─event_id─┬─sku───┬─qty─┬─price─┐
21. │        1 │ SKU-A │   1 │ 10.99 │
32. │        2 │ SKU-B │   1 │ 11.99 │
43. │        2 │ SKU-C │   2 │ 12.99 │
54. │        3 │ SKU-C │   1 │ 12.99 │
65. │        3 │ SKU-D │   2 │ 13.99 │
76. │        3 │ SKU-A │   3 │ 14.99 │
87. │        4 │ SKU-D │   1 │ 13.99 │
98. │        5 │ SKU-A │   1 │ 14.99 │
109. │        5 │ SKU-B │   2 │ 15.99 │
11   └──────────┴───────┴─────┴───────┘

Event 3 had three items, so it became three rows. Once the array is flat, aggregate over it like any table. Revenue per SKU across the whole file:

1clickhouse local -q "
2SELECT item.sku AS sku, sum(item.qty) AS units, round(sum(item.qty * item.price), 2) AS revenue
3FROM file('events.jsonl')
4ARRAY JOIN items AS item
5GROUP BY sku
6ORDER BY revenue DESC
7FORMAT PrettyCompact"
1   ┌─sku───┬─units─┬─revenue─┐
21. │ SKU-A │     5 │   70.95 │
32. │ SKU-B │     3 │   43.97 │
43. │ SKU-D │     3 │   41.97 │
54. │ SKU-C │     3 │   38.97 │
6   └───────┴───────┴─────────┘

Irregular keys: the JSON type and JSONExtract #

Dot access works when every record shares a shape. The props object does not: a login carries method and mfa, a purchase carries gateway and installments, a refund carries reason. Folding all of that into one wide tuple is lossy and brittle.

You have two clean options. The first keeps the column as raw JSON text and pulls keys on demand with the JSONExtract* family. Override just that column's type to String so it is not flattened:

1clickhouse local -q "
2SELECT
3  event_type,
4  JSONExtractString(props, 'gateway') AS gateway,
5  JSONExtractString(props, 'reason')  AS refund_reason,
6  JSONExtractBool(props, 'mfa')       AS mfa
7FROM file('events.jsonl', 'JSONEachRow',
8  'event_id UInt32, event_type String, props String')
9FORMAT PrettyCompact"
1   ┌─event_type─┬─gateway─┬─refund_reason─┬─mfa─┐
21. │ login      │         │               │   132. │ purchase   │ stripe  │               │   043.view       │         │               │   054. │ refund     │         │ damaged       │   065. │ login      │         │               │   17   └────────────┴─────────┴───────────────┴─────┘

JSONExtractString, JSONExtractInt, JSONExtractBool, JSONExtractArrayRaw and friends each take a path and return the typed value, with a default when the key is missing. Good for one-off pulls from messy text.

The second option is the JSON type. Declare the column as JSON and dynamic keys become first-class paths: present keys return their value, absent keys return NULL, all without listing the schema up front:

1clickhouse local -q "
2SELECT
3  event_type,
4  props.gateway      AS gateway,
5  props.installments AS installments,
6  props.reason       AS refund_reason
7FROM file('events.jsonl', 'JSONEachRow',
8  'event_id UInt32, event_type String, props JSON')
9SETTINGS enable_json_type = 1
10FORMAT PrettyCompact"
1   ┌─event_type─┬─gateway─┬─installments─┬─refund_reason─┐
21. │ login      │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ         │ ᴺᵁᴸᴸ          │
32. │ purchase   │ stripe  │ 3            │ ᴺᵁᴸᴸ          │
43.view       │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ         │ ᴺᵁᴸᴸ          │
54. │ refund     │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ         │ damaged       │
65. │ login      │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ         │ ᴺᵁᴸᴸ          │
7   └────────────┴─────────┴──────────────┴───────────────┘

The JSON type stores each path separately and tracks its type, so reads stay columnar even when the keys are sparse. Reach for JSONExtract* for quick text probing; reach for the JSON type when irregular keys are the rule and you want to keep querying them as columns.

You can also treat an entire document as one JSON value with the JSONAsObject input format, then dot your way in from a single column:

1clickhouse local -q "
2SELECT json.event_type, json.user.geo.country AS country, json.user.name AS name
3FROM file('events.jsonl', JSONAsObject, 'json JSON')
4SETTINGS enable_json_type = 1
5FORMAT PrettyCompact"
1   ┌─json.event_type─┬─country─┬─name─┐
21. │ login           │ GB      │ Ada  │
32. │ purchase        │ US      │ Lin  │
43.view            │ DE      │ Sam  │
54. │ refund          │ FR      │ Mei  │
65. │ login           │ IN      │ Omar │
7   └─────────────────┴─────────┴──────┘

Gzipped JSON works too #

You do not unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so a .jsonl.gz is queried exactly like a .jsonl:

1clickhouse local -q "
2SELECT user.geo.country AS country, count() AS events
3FROM file('events.jsonl.gz', JSONEachRow)
4GROUP BY country ORDER BY country
5FORMAT PrettyCompact"

The same applies to .jsonl.zst, .jsonl.xz, and other supported codecs. The compression is inferred from the file name.

How fast is it? #

The work in nested JSON is parsing text and walking paths. On a 500,000-row, ~137 MB JSONL file (events_large.jsonl), exploding the items array with ARRAY JOIN and grouping by user.geo.country and SKU runs in:

1clickhouse local -q "
2SELECT user.geo.country AS country, item.sku AS sku,
3       sum(item.qty) AS units, round(sum(item.qty * item.price), 2) AS revenue
4FROM file('events_large.jsonl')
5ARRAY JOIN items AS item
6GROUP BY country, sku ORDER BY revenue DESC LIMIT 5"

~0.50 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That time includes parsing every nested object from scratch and the process startup; there is no cached table. clickhouse local runs the same SQL unchanged across dozens of formats and remote sources.

The same SQL scales unchanged #

The dot access, ARRAY JOIN, and JSON type you just used on a laptop file are the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Swap file('events.jsonl') for a table name and the nested-JSON logic stays put. You prototype against a file on your machine and ship the identical query to production.

Run it yourself #

The complete, runnable example lives here. It has generate.sh (builds the demo file, a gzipped copy, and the ~137 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-nested-json

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-nested-json
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...