To convert ORC to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the ORC file and write the result:
clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
{"event_id":1,"event_date":"2026-01-01","action":"login","country":"GB","amount":5,"source":{"user_id":1,"platform":"web"}}
{"event_id":2,"event_date":"2026-01-02","action":"purchase","country":"US","amount":6.01,"source":{"user_id":2,"platform":"ios"}}
{"event_id":3,"event_date":"2026-01-03","action":"refund","country":"DE","amount":7.02,"source":{"user_id":3,"platform":"android"}}
ORC is self-describing: the column types live in the file footer, so clickhouse local reads the schema without you declaring one. The file is read in place with no import step, and the conversion streams row by row, so files larger than RAM convert without issue.
ORC is columnar and self-describing: every column's type lives in the file footer. clickhouse-local reads that footer, so you never declare a schema. Check what it found with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.orc')"
event_id Nullable(Int64)
event_date Nullable(Date32)
action Nullable(String)
country Nullable(String)
amount Nullable(Decimal(18, 2))
source Tuple(user_id Nullable(Int16), platform Nullable(String))
Those types decide how each value is written to JSON. The important case is the last column: source is an ORC struct, and structs map to nested JSON objects, not to a flattened or stringified blob. In the output above, source becomes {"user_id":1,"platform":"web"} with its field names intact. Dates become ISO strings ("2026-01-01"). The amount decimal is written as a JSON number by default.
There are two shapes people mean by "JSON", and ClickHouse writes both.
JSONEachRow (above) gives one object per line. It's the right choice for streaming, for piping into jq, and for tools that read line by line. The conventional extensions are .jsonl or .ndjson.
JSON writes one document: a top-level object with a meta block describing the columns and a data array of rows.
clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.json' FORMAT JSON"
{
"meta":
[
{
"name": "event_id",
"type": "Nullable(Int64)"
},
...
{
"name": "source",
"type": "Tuple(user_id Nullable(Int16), platform Nullable(String))"
}
],
"data":
[
{
"event_id": 1,
"event_date": "2026-01-01",
...
Use JSONEachRow unless something downstream specifically wants the single-array form.
Filter and reshape during the convert
Because the ORC file is a table, the conversion is just a SELECT. You can project columns, filter rows, and flatten nested fields in the same command instead of converting everything and post-processing it:
clickhouse local -q "
SELECT event_id, event_date, country, amount, source.platform AS platform
FROM file('events.orc')
WHERE action = 'purchase'
INTO OUTFILE 'purchases.jsonl' FORMAT JSONEachRow"
{"event_id":2,"event_date":"2026-01-02","country":"US","amount":6.01,"platform":"ios"}
{"event_id":6,"event_date":"2026-01-06","country":"GB","amount":10.05,"platform":"android"}
{"event_id":10,"event_date":"2026-01-10","country":"IN","amount":14.09,"platform":"web"}
{"event_id":14,"event_date":"2026-01-14","country":"FR","amount":18.13,"platform":"ios"}
{"event_id":18,"event_date":"2026-01-18","country":"DE","amount":22.17,"platform":"android"}
source.platform reaches into the struct and lifts one field to a top-level key. This is the information gain over a generic upload-and-convert site: those give you the whole file, as-is. Here you choose the columns, the rows, and the shape in one pass.
A few settings change how types land in JSON. The most useful with ORC data:
-
Big integers and decimals. By default Int64, UInt64, and Decimal are written as JSON numbers. JSON parsers backed by IEEE-754 doubles lose precision past 2^53, so for IDs or money it's safer to emit them as quoted strings:
clickhouse local -q "
SELECT event_id, amount FROM file('events.orc') LIMIT 3
FORMAT JSONEachRow
SETTINGS output_format_json_quote_64bit_integers = 1, output_format_json_quote_decimals = 1"
{"event_id":"1","amount":"5"}
{"event_id":"2","amount":"6.01"}
{"event_id":"3","amount":"7.02"}
-
Compression on the way out. Add a .gz, .zst, or .lz4 suffix to the output name and ClickHouse compresses transparently: INTO OUTFILE 'events.jsonl.zst'. JSON is verbose, so this matters — the 3M-row file below is 390 MB as raw NDJSON.
ORC structs and lists round-trip cleanly into nested JSON, so the conversion is lossless for the common case. The one thing to decide on purpose is numeric precision, hence the setting above.
On a 3,000,000-row ORC file (events_large.orc, ~20 MB on disk), the full convert to NDJSON:
clickhouse local -q "SELECT * FROM file('events_large.orc') INTO OUTFILE 'events_large.jsonl' FORMAT JSONEachRow"
~0.38 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That reads every row, decodes the ORC columns, and serializes 390 MB of JSON text. The number is best-of-3 warm and may shift slightly under concurrent load; correctness (types preserved, structs nested) is the part that's guaranteed.
The same SQL scales without a rewrite. It runs against a single file here, against a directory of ORC files with a glob, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop. The same command works everywhere ClickHouse runs.
chDB is ClickHouse as an in-process Python library. The conversion is the identical SQL, the same SELECT ... INTO OUTFILE ... FORMAT JSONEachRow, so no logic changes between the CLI and Python:
import chdb
chdb.query("""
SELECT * FROM file('data/events.orc')
INTO OUTFILE 'data/events_chdb.jsonl' TRUNCATE
FORMAT JSONEachRow
""")
It writes byte-for-byte the same NDJSON:
{"event_id":1,"event_date":"2026-01-01","action":"login","country":"GB","amount":5,"source":{"user_id":1,"platform":"web"}}
{"event_id":2,"event_date":"2026-01-02","action":"purchase","country":"US","amount":6.01,"source":{"user_id":2,"platform":"ios"}}
{"event_id":3,"event_date":"2026-01-03","action":"refund","country":"DE","amount":7.02,"source":{"user_id":3,"platform":"android"}}
If you already have the data in a notebook, see how to read an ORC file in Python with chDB.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-json