To convert Parquet to ORC, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC"
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
tags Map(String, Nullable(String))
The write command exits silently; the response above is what DESCRIBE file('events.orc') returns once the file is on disk. ClickHouse reads the schema straight from the Parquet footer and writes ORC in one pass, so the file is converted in place with no upload or import step first.
SELECT from the Parquet file, INTO OUTFILE as ORC. TRUNCATE overwrites the output if it already exists, so the command is safe to re-run.
clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC"
The .orc extension is enough for ClickHouse to pick the writer, but the explicit FORMAT ORC keeps the intent obvious and works even when the extension is unusual.
There is no DDL to write. ClickHouse reads the column names and types from the Parquet footer:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_date Date32
event_id UInt64
country String
action String
amount Float64
tags Map(String, String)
After the conversion, the ORC file reports the same columns:
clickhouse local -q "DESCRIBE file('events.orc')"
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
tags Map(String, Nullable(String))
Two differences are worth knowing about, and neither loses data:
- Columns come back
Nullable. ORC carries a null bitmap per column, so on read-back every column is reported as Nullable. The values are identical; the type just gained an explicit "can be null" marker.
UInt64 is read back as Int64. ORC's integer types are signed, so a ClickHouse UInt64 maps to ORC's 64-bit signed integer. For values inside the signed range this is lossless. If you store IDs near the top of the unsigned range, keep them as String before converting, or be aware of the mapping.
The nested Map column survives the trip intact. ORC supports nested map and list types natively, which is the payoff of going columnar-to-columnar: you are not flattening structure into a single text column the way a CSV target would force you to.
The fastest way to trust a conversion is to run the same query against both files and compare. Same aggregate, same numbers:
clickhouse local -q "SELECT 'parquet' AS src, country, count() AS c, round(sum(amount),2) AS amount FROM file('events.parquet') GROUP BY country ORDER BY country"
clickhouse local -q "SELECT 'orc' AS src, country, count() AS c, round(sum(amount),2) AS amount FROM file('events.orc') GROUP BY country ORDER BY country"
parquet AU 3 48.33
parquet DE 3 39.24
parquet FR 3 42.27
parquet GB 4 56.36
parquet IN 3 45.3
parquet US 4 60.4
orc AU 3 48.33
orc DE 3 39.24
orc FR 3 42.27
orc GB 4 56.36
orc IN 3 45.3
orc US 4 60.4
This is where a local command beats an upload-and-download converter site: you control the output, and you can script it. Pick the ORC compression method explicitly with a setting:
clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events_zstd.orc' TRUNCATE FORMAT ORC SETTINGS output_format_orc_compression_method='zstd'"
zstd gives a strong size-to-speed balance; lz4 is faster to write and read with slightly larger files; none skips compression when something downstream re-compresses anyway. Because both Parquet and ORC are columnar, the per-column encodings do most of the work, and the codec mainly trades file size against CPU.
There is no row limit, no file-size cap, and the data never leaves your machine. The same command runs in a shell script, a cron job, or a CI step.
On a 3,000,000-row Parquet file (~22 MB), the full Parquet-to-ORC conversion runs in:
clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.orc' TRUNCATE FORMAT ORC"
~0.73 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number can drift slightly under concurrent load, but the shape holds: the conversion is I/O-bound, not a heavyweight transform. The resulting ORC file is ~20 MB, in the same ballpark as the Parquet input, as expected for two compressed columnar formats over the same data.
Going the other way is the same one-liner with the formats swapped. See how to convert ORC to Parquet.
The same conversion runs in-process with chDB, the embeddable ClickHouse engine for Python. Same SQL, written straight to a file:
import chdb
chdb.query("SELECT * FROM file('events.parquet') INTO OUTFILE 'events_chdb.orc' TRUNCATE FORMAT ORC")
df = chdb.query(
"SELECT country, count() AS c, round(sum(amount), 2) AS amount "
"FROM file('events_chdb.orc') GROUP BY country ORDER BY country",
"DataFrame",
)
print(df.to_string(index=False))
country c amount
AU 3 48.33
DE 3 39.24
FR 3 42.27
GB 4 56.36
IN 3 45.30
US 4 60.40
No server, no temp files you have to manage, and the result lands in a real ORC file you can hand to any ORC reader.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py plus run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-orc
The same SQL you ran here scales straight up: from a file on your laptop, to a ClickHouse server, to ClickHouse Cloud, with no rewrite.