How to convert ORC to Parquet

Al Brown
Last updated: Jun 6, 2026

To convert ORC to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then read the ORC file and write Parquet in one query:

1clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.parquet' FORMAT Parquet"
120

Both ORC and Parquet embed their schemas in a file footer, so clickhouse local reads the column names and types straight from the ORC file and carries them into Parquet with no import step first. The 20 above is the row count of the Parquet file written, confirming the conversion round-tripped.

Both formats are columnar, so types survive #

ORC and Parquet are close cousins: both store data column-by-column with an embedded schema in a footer. That makes this conversion lossless for the common types. clickhouse-local reads the ORC schema first; check what it found with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.orc')"
1event_date	Nullable(Date32)
2event_id	Nullable(Int64)
3country	Nullable(String)
4event_type	Nullable(String)
5amount	Nullable(Float64)
6attrs	Map(String, Nullable(String))

Run the same DESCRIBE on the Parquet output and you get an identical schema. The attrs column is a nested Map, and it stays a nested Map on the Parquet side, because Parquet has a native map type too. There's no flattening and no JSON-stringifying:

1clickhouse local -q "SELECT event_date, event_id, country, event_type, amount, attrs FROM file('events.parquet') ORDER BY event_id LIMIT 5"
1   ┌─event_date─┬─event_id─┬─country─┬─event_type─┬─amount─┬─attrs─────────────────────────────┐
21.2026-01-011 │ GB      │ click      │      5 │ {'device':'mobile','plan':'free'} │
32.2026-01-022 │ US      │ view6.01 │ {'device':'desktop','plan':'pro'} │
43.2026-01-033 │ DE      │ signup     │   7.02 │ {'device':'mobile','plan':'free'} │
54.2026-01-044 │ FR      │ purchase   │   8.03 │ {'device':'desktop','plan':'pro'} │
65.2026-01-055IN      │ click      │   9.04 │ {'device':'mobile','plan':'free'} │
7   └────────────┴──────────┴─────────┴────────────┴────────┴───────────────────────────────────┘

One thing worth knowing: ORC carries Date columns as Date32 (a wider date range), and ClickHouse preserves that on read, so a date column lands as Date32 in the Parquet output rather than the narrower Date. If you want a specific target type, cast it in the SELECT (for example CAST(event_date AS Date) AS event_date). The FORMAT Parquet clause is optional when the output file ends in .parquet, since the extension is enough to pick the format; keep it when the name is ambiguous.

Options an upload converter can't give you #

This is where running the conversion locally pays off. Online "ORC to Parquet" converters take whatever defaults they ship with, and they need you to upload the file first. Locally you control the column-store knobs and your data never leaves the machine.

The one that matters most is the compression codec. Parquet compresses per column; the codec changes the output size significantly. Pass it as a setting:

1clickhouse local -q "SELECT * FROM file('events_large.orc') INTO OUTFILE 'events_large.parquet' FORMAT Parquet SETTINGS output_format_parquet_compression_method='zstd'"

On the 3,000,000-row sample file, the same data written three ways:

123566016  events_large.orc
222729944  events_large_zstd.parquet
333072661  events_large_snappy.parquet
447724146  events_large_none.parquet

zstd (the default in this build) gives the smallest file, snappy trades size for faster decompression, none is fastest to write and largest. Other useful settings include output_format_parquet_row_group_size to tune row-group granularity for your read patterns. None of this is reachable from a one-click upload site.

You can also read the Parquet footer back without touching the data, using the ParquetMetadata format:

1clickhouse local -q "SELECT num_columns, num_rows, num_row_groups, format_version FROM file('events_large.parquet', ParquetMetadata) FORMAT Vertical"
1num_columns:    7
2num_rows:       3000000
3num_row_groups: 3
4format_version: 2

(num_columns is 7 because the nested Map is stored as separate key and value columns inside Parquet.)

How fast is it? #

Conversion is I/O- and codec-bound, and ClickHouse parallelises both. Converting the 3,000,000-row events_large.orc (~22 MB) to Parquet completes in:

~0.52 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number includes reading and decoding the ORC file and re-encoding every column as compressed Parquet from scratch.

Convert ORC to Parquet in Python with chDB #

If you work in Python, chDB is the same ClickHouse engine in-process. The conversion is the identical SQL, no server and no upload:

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('events.orc') "
5    "INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
6)
7
8# verify
9print(chdb.query("SELECT count() FROM file('events.parquet')", "CSV"), end="")
120

The compression setting works the same way — append SETTINGS output_format_parquet_compression_method='zstd' to the conversion query. To pull the result into a DataFrame instead of a file, see how to read a Parquet file in Python with chDB.

Reverse direction #

Going the other way is the mirror image: read the Parquet, write ORC. See how to convert Parquet to ORC. To learn the format itself, see what is the ORC file format and what is a Parquet file.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-parquet

The same SQL that converts one file on your laptop runs unchanged against a directory of files, against remote object storage, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...