How to convert Parquet to ORC

To convert Parquet to ORC, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then convert the file:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC"

1event_date	Nullable(Date32)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5amount	Nullable(Float64)
6tags	Map(String, Nullable(String))

The write command exits silently; the response above is what DESCRIBE file('events.orc') returns once the file is on disk. ClickHouse reads the schema straight from the Parquet footer and writes ORC in one pass, so the file is converted in place with no upload or import step first.

Convert it in one line

SELECT from the Parquet file, INTO OUTFILE as ORC. TRUNCATE overwrites the output if it already exists, so the command is safe to re-run.

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC"

The .orc extension is enough for ClickHouse to pick the writer, but the explicit FORMAT ORC keeps the intent obvious and works even when the extension is unusual.

The schema carries over

There is no DDL to write. ClickHouse reads the column names and types from the Parquet footer:

1clickhouse local -q "DESCRIBE file('events.parquet')"

1event_date	Date32
2event_id	UInt64
3country	String
4action	String
5amount	Float64
6tags	Map(String, String)

After the conversion, the ORC file reports the same columns:

1clickhouse local -q "DESCRIBE file('events.orc')"

1event_date	Nullable(Date32)
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5amount	Nullable(Float64)
6tags	Map(String, Nullable(String))

Two differences are worth knowing about, and neither loses data:

Columns come back Nullable. ORC carries a null bitmap per column, so on read-back every column is reported as Nullable. The values are identical; the type just gained an explicit "can be null" marker.
UInt64 is read back as Int64. ORC's integer types are signed, so a ClickHouse UInt64 maps to ORC's 64-bit signed integer. For values inside the signed range this is lossless. If you store IDs near the top of the unsigned range, keep them as String before converting, or be aware of the mapping.

The nested Map column survives the trip intact. ORC supports nested map and list types natively, which is the payoff of going columnar-to-columnar: you are not flattening structure into a single text column the way a CSV target would force you to.

Check the values match

The fastest way to trust a conversion is to run the same query against both files and compare. Same aggregate, same numbers:

1clickhouse local -q "SELECT 'parquet' AS src, country, count() AS c, round(sum(amount),2) AS amount FROM file('events.parquet') GROUP BY country ORDER BY country"
2clickhouse local -q "SELECT 'orc'     AS src, country, count() AS c, round(sum(amount),2) AS amount FROM file('events.orc')     GROUP BY country ORDER BY country"

1parquet	AU	3	48.33
2parquet	DE	3	39.24
3parquet	FR	3	42.27
4parquet	GB	4	56.36
5parquet	IN	3	45.3
6parquet	US	4	60.4
7orc	AU	3	48.33
8orc	DE	3	39.24
9orc	FR	3	42.27
10orc	GB	4	56.36
11orc	IN	3	45.3
12orc	US	4	60.4

Options: choose the ORC compression codec

This is where a local command beats an upload-and-download converter site: you control the output, and you can script it. Pick the ORC compression method explicitly with a setting:

1clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events_zstd.orc' TRUNCATE FORMAT ORC SETTINGS output_format_orc_compression_method='zstd'"

zstd gives a strong size-to-speed balance; lz4 is faster to write and read with slightly larger files; none skips compression when something downstream re-compresses anyway. Because both Parquet and ORC are columnar, the per-column encodings do most of the work, and the codec mainly trades file size against CPU.

There is no row limit, no file-size cap, and the data never leaves your machine. The same command runs in a shell script, a cron job, or a CI step.

How fast is it?

On a 3,000,000-row Parquet file (~22 MB), the full Parquet-to-ORC conversion runs in:

1clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.orc' TRUNCATE FORMAT ORC"

~0.73 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number can drift slightly under concurrent load, but the shape holds: the conversion is I/O-bound, not a heavyweight transform. The resulting ORC file is ~20 MB, in the same ballpark as the Parquet input, as expected for two compressed columnar formats over the same data.

Reverse direction?

Going the other way is the same one-liner with the formats swapped. See how to convert ORC to Parquet.

Prefer Python?

The same conversion runs in-process with chDB, the embeddable ClickHouse engine for Python. Same SQL, written straight to a file:

1import chdb
2
3chdb.query("SELECT * FROM file('events.parquet') INTO OUTFILE 'events_chdb.orc' TRUNCATE FORMAT ORC")
4
5df = chdb.query(
6    "SELECT country, count() AS c, round(sum(amount), 2) AS amount "
7    "FROM file('events_chdb.orc') GROUP BY country ORDER BY country",
8    "DataFrame",
9)
10print(df.to_string(index=False))

1country  c  amount
2     AU  3   48.33
3     DE  3   39.24
4     FR  3   42.27
5     GB  4   56.36
6     IN  3   45.30
7     US  4   60.40

No server, no temp files you have to manage, and the result lands in a real ORC file you can hand to any ORC reader.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py plus run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-orc

The same SQL you ran here scales straight up: from a file on your laptop, to a ClickHouse server, to ClickHouse Cloud, with no rewrite.

Convert it in one line

The schema carries over

Check the values match

Options: choose the ORC compression codec

How fast is it?

Reverse direction?

Prefer Python?

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to convert Parquet to ORC

Convert it in one line #

The schema carries over #

Check the values match #

Options: choose the ORC compression codec #

How fast is it? #

Reverse direction? #

Prefer Python? #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

Convert it in one line

The schema carries over

Check the values match

Options: choose the ORC compression codec

How fast is it?

Reverse direction?

Prefer Python?

Run it yourself

Related