What is NDJSON / JSON Lines?

NDJSON (newline-delimited JSON), also called JSON Lines or JSONL, is a text format that stores one independent JSON object per line, separated by newlines. There is no enclosing array and no commas between records. Because each line is a complete, self-contained document, a program can read, write, and process the file one line at a time without holding the whole thing in memory. To query an NDJSON file in place, point clickhouse local at it with FORMAT JSONEachRow:

1clickhouse local -q "SELECT * FROM file('events.ndjson', JSONEachRow)"

At a glance

Property	NDJSON / JSON Lines
Layout	Row-oriented text, one JSON object per line
Delimiter	Newline (`\n`) between records; no commas, no wrapping `[ ]`
Schema	None enforced; inferred from the JSON keys and value types
Streamable	Yes — each line is parsed independently
Compression	None built in; wraps cleanly with `.gz` / `.zst`
File extension	`.ndjson`, `.jsonl`, sometimes `.json`
ClickHouse FORMAT	`JSONEachRow`
Typical use	Logs, event streams, API exports, append-only data

NDJSON vs a single JSON array

The whole format is one rule: one JSON object, one line. A three-record NDJSON file looks like this:

1{"id":0,"event_time":"2026-01-01 00:00:00","country":"GB","event_type":"view","revenue":367.42}
2{"id":1,"event_time":"2026-01-01 00:01:00","country":"AU","event_type":"click","revenue":7.53}
3{"id":2,"event_time":"2026-01-01 00:02:00","country":"IN","event_type":"purchase","revenue":103.67}

The same three records as a single JSON array look like this:

1[
2  {"id":0,"event_time":"2026-01-01 00:00:00","country":"GB","event_type":"view","revenue":367.42},
3  {"id":1,"event_time":"2026-01-01 00:01:00","country":"AU","event_type":"click","revenue":7.53},
4  {"id":2,"event_time":"2026-01-01 00:02:00","country":"IN","event_type":"purchase","revenue":103.67}
5]

Same data, two different documents. The difference between them comes down to streamability.

A JSON array is a single document. The opening [ and the closing ] belong together, and the records are joined by commas. To parse it correctly you generally have to read to the closing bracket, because the file is only valid JSON as a whole. Appending a record means rewriting the end of the file.

NDJSON has no enclosing structure. Each line stands alone, so you can read one record, process it, and discard it before touching the next. You can tail -f an NDJSON file as it grows, split it across machines on line boundaries, or append a new record by writing one more line. That is why logs, event pipelines, and bulk-export APIs almost always emit NDJSON rather than a giant array.

How it relates to JSON, CSV and Parquet

NDJSON is still JSON: each line carries its own keys, so the data is self-describing per record, and nested objects and arrays are allowed inside a line. That flexibility is the trade-off. There is no shared schema, types live in the values rather than a header, and the keys repeat on every line, so NDJSON files are larger and slower to scan than a typed columnar format.

CSV is more compact for flat, tabular data but cannot represent nesting. Parquet is columnar, typed and compressed, which makes analytical queries far faster, at the cost of being a binary format you cannot read or append by hand. A common pattern is to land data as NDJSON and convert it to Parquet for analysis; see convert NDJSON to Parquet.

Reading a real NDJSON file

Description is one thing; reading a real file is more convincing. ClickHouse calls the NDJSON format JSONEachRow, and it infers the schema directly from the JSON in each line. clickhouse local is part of ClickHouse; install it with clickhousectl, the ClickHouse CLI for local and cloud:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Point DESCRIBE at the file to see the schema ClickHouse infers from the JSON values, with no header and no type hints:

1clickhouse local -q "DESCRIBE file('events.ndjson', JSONEachRow)"

1┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
2   ┃ name       ┃ type               ┃
3   ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
41. │ id         │ Nullable(Int64)    │
52. │ event_time │ Nullable(DateTime) │
63. │ country    │ Nullable(String)   │
74. │ event_type │ Nullable(String)   │
85. │ revenue    │ Nullable(Float64)  │
9   └────────────┴────────────────────┘

ClickHouse read the keys as column names and inferred each type from the values: id is an integer, event_time is parsed as a DateTime, revenue is a float. Now query it like any table, no load step:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(revenue)) AS revenue
3FROM file('events.ndjson', JSONEachRow)
4GROUP BY country
5ORDER BY revenue DESC"

1┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
2   ┃ country ┃ events ┃ revenue ┃
3   ┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
41. │ GB      │      2 │     558 │
52. │ IN      │      2 │     547 │
63. │ DE      │      1 │     449 │
74. │ AU      │      2 │     374 │
85. │ US      │      1 │      36 │
9   └─────────┴────────┴─────────┘

JSONEachRow is forgiving about the wrapper. The same parser also reads a top-level JSON array, so if someone hands you the array form instead, the identical query still works:

1clickhouse local -q "
2SELECT country, count() AS events
3FROM file('events_array.json', JSONEachRow)
4GROUP BY country
5ORDER BY country"

1┏━━━━━━━━━┳━━━━━━━━┓
2   ┃ country ┃ events ┃
3   ┡━━━━━━━━━╇━━━━━━━━┩
41. │ AU      │      2 │
52. │ DE      │      1 │
63. │ GB      │      2 │
74. │ IN      │      2 │
85. │ US      │      1 │
9   └─────────┴────────┘

Proving streamability

The defining property of NDJSON is that you can process it a line at a time. Pipe just the first 100 lines of a 1.5-million-row file into ClickHouse and it parses cleanly, because each line is a complete record:

1head -n 100 events_large.ndjson | clickhouse local -q "
2SELECT count() AS rows, max(revenue) AS max_revenue
3FROM file('-', JSONEachRow)"

1┏━━━━━━┳━━━━━━━━━━━━━┓
2   ┃ rows ┃ max_revenue ┃
3   ┡━━━━━━╇━━━━━━━━━━━━━┩
41. │  100 │      499.84 │
5   └──────┴─────────────┘

Try that with a JSON array and head hands you a truncated, invalid document with no closing bracket. NDJSON does not care where you cut it.

Scanning the whole file is fast too. Filtering and aggregating all 1,500,000 rows of the NDJSON file (147 MB) runs in 0.34s (best of three, warm cache, on an Apple M4 Pro, 14 cores, 24 GB RAM; the machine had other load at the time). JSON text parsing is more work than reading a typed binary format, so if you query the same data repeatedly, convert it to Parquet first.

Read one yourself

You don't need a server or an import step to work with NDJSON. clickhouse local reads a .ndjson, .jsonl or line-delimited .json file in place from the command line; it's a handy single-binary tool for files on your laptop. The same SQL runs unchanged against a file, a ClickHouse server, or ClickHouse Cloud, so a query you write against a local NDJSON file scales up without edits.

For the full walkthrough, covering globbing many files, extracting nested fields and casting types, see run SQL on a JSONL file and query an NDJSON file.

Run it yourself: the data generator and every command above live in local-analytics/what-is-ndjson in the ClickHouse examples repo.

Prefer Python? → Read a JSONL file in Python.

At a glance

NDJSON vs a single JSON array

How it relates to JSON, CSV and Parquet

Reading a real NDJSON file

Proving streamability

Read one yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

What is NDJSON / JSON Lines?

At a glance #

NDJSON vs a single JSON array #

How it relates to JSON, CSV and Parquet #

Reading a real NDJSON file #

Proving streamability #

Read one yourself #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

At a glance

NDJSON vs a single JSON array

How it relates to JSON, CSV and Parquet

Reading a real NDJSON file

Proving streamability

Read one yourself