How to convert ORC to JSON

Al Brown
Last updated: Jun 15, 2026

To convert ORC to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the ORC file and write the result:

1clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
1{"event_id":1,"event_date":"2026-01-01","action":"login","country":"GB","amount":5,"source":{"user_id":1,"platform":"web"}}
2{"event_id":2,"event_date":"2026-01-02","action":"purchase","country":"US","amount":6.01,"source":{"user_id":2,"platform":"ios"}}
3{"event_id":3,"event_date":"2026-01-03","action":"refund","country":"DE","amount":7.02,"source":{"user_id":3,"platform":"android"}}

ORC is self-describing: the column types live in the file footer, so clickhouse local reads the schema without you declaring one. The file is read in place with no import step, and the conversion streams row by row, so files larger than RAM convert without issue.

The schema comes from the ORC, the types carry into JSON #

ORC is columnar and self-describing: every column's type lives in the file footer. clickhouse-local reads that footer, so you never declare a schema. Check what it found with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.orc')"
1event_id    Nullable(Int64)
2event_date  Nullable(Date32)
3action      Nullable(String)
4country     Nullable(String)
5amount      Nullable(Decimal(18, 2))
6source      Tuple(user_id Nullable(Int16), platform Nullable(String))

Those types decide how each value is written to JSON. The important case is the last column: source is an ORC struct, and structs map to nested JSON objects, not to a flattened or stringified blob. In the output above, source becomes {"user_id":1,"platform":"web"} with its field names intact. Dates become ISO strings ("2026-01-01"). The amount decimal is written as a JSON number by default.

NDJSON or a single JSON array #

There are two shapes people mean by "JSON", and ClickHouse writes both.

JSONEachRow (above) gives one object per line. It's the right choice for streaming, for piping into jq, and for tools that read line by line. The conventional extensions are .jsonl or .ndjson.

JSON writes one document: a top-level object with a meta block describing the columns and a data array of rows.

1clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.json' FORMAT JSON"
1{
2	"meta":
3	[
4		{
5			"name": "event_id",
6			"type": "Nullable(Int64)"
7		},
8		...
9		{
10			"name": "source",
11			"type": "Tuple(user_id Nullable(Int16), platform Nullable(String))"
12		}
13	],
14
15	"data":
16	[
17		{
18			"event_id": 1,
19			"event_date": "2026-01-01",
20			...

Use JSONEachRow unless something downstream specifically wants the single-array form.

Filter and reshape during the convert #

Because the ORC file is a table, the conversion is just a SELECT. You can project columns, filter rows, and flatten nested fields in the same command instead of converting everything and post-processing it:

1clickhouse local -q "
2SELECT event_id, event_date, country, amount, source.platform AS platform
3FROM file('events.orc')
4WHERE action = 'purchase'
5INTO OUTFILE 'purchases.jsonl' FORMAT JSONEachRow"
1{"event_id":2,"event_date":"2026-01-02","country":"US","amount":6.01,"platform":"ios"}
2{"event_id":6,"event_date":"2026-01-06","country":"GB","amount":10.05,"platform":"android"}
3{"event_id":10,"event_date":"2026-01-10","country":"IN","amount":14.09,"platform":"web"}
4{"event_id":14,"event_date":"2026-01-14","country":"FR","amount":18.13,"platform":"ios"}
5{"event_id":18,"event_date":"2026-01-18","country":"DE","amount":22.17,"platform":"android"}

source.platform reaches into the struct and lifts one field to a top-level key. This is the information gain over a generic upload-and-convert site: those give you the whole file, as-is. Here you choose the columns, the rows, and the shape in one pass.

Options worth knowing #

A few settings change how types land in JSON. The most useful with ORC data:

  • Big integers and decimals. By default Int64, UInt64, and Decimal are written as JSON numbers. JSON parsers backed by IEEE-754 doubles lose precision past 2^53, so for IDs or money it's safer to emit them as quoted strings:

    clickhouse local -q "
    SELECT event_id, amount FROM file('events.orc') LIMIT 3
    FORMAT JSONEachRow
    SETTINGS output_format_json_quote_64bit_integers = 1, output_format_json_quote_decimals = 1"
    
    {"event_id":"1","amount":"5"}
    {"event_id":"2","amount":"6.01"}
    {"event_id":"3","amount":"7.02"}
    
  • Compression on the way out. Add a .gz, .zst, or .lz4 suffix to the output name and ClickHouse compresses transparently: INTO OUTFILE 'events.jsonl.zst'. JSON is verbose, so this matters — the 3M-row file below is 390 MB as raw NDJSON.

ORC structs and lists round-trip cleanly into nested JSON, so the conversion is lossless for the common case. The one thing to decide on purpose is numeric precision, hence the setting above.

How fast is it? #

On a 3,000,000-row ORC file (events_large.orc, ~20 MB on disk), the full convert to NDJSON:

1clickhouse local -q "SELECT * FROM file('events_large.orc') INTO OUTFILE 'events_large.jsonl' FORMAT JSONEachRow"

~0.38 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That reads every row, decodes the ORC columns, and serializes 390 MB of JSON text. The number is best-of-3 warm and may shift slightly under concurrent load; correctness (types preserved, structs nested) is the part that's guaranteed.

The same SQL scales without a rewrite. It runs against a single file here, against a directory of ORC files with a glob, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop. The same command works everywhere ClickHouse runs.

chDB: the same convert in Python #

chDB is ClickHouse as an in-process Python library. The conversion is the identical SQL, the same SELECT ... INTO OUTFILE ... FORMAT JSONEachRow, so no logic changes between the CLI and Python:

1import chdb
2
3chdb.query("""
4SELECT * FROM file('data/events.orc')
5INTO OUTFILE 'data/events_chdb.jsonl' TRUNCATE
6FORMAT JSONEachRow
7""")

It writes byte-for-byte the same NDJSON:

1{"event_id":1,"event_date":"2026-01-01","action":"login","country":"GB","amount":5,"source":{"user_id":1,"platform":"web"}}
2{"event_id":2,"event_date":"2026-01-02","action":"purchase","country":"US","amount":6.01,"source":{"user_id":2,"platform":"ios"}}
3{"event_id":3,"event_date":"2026-01-03","action":"refund","country":"DE","amount":7.02,"source":{"user_id":3,"platform":"android"}}

If you already have the data in a notebook, see how to read an ORC file in Python with chDB.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-json

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...