To convert ORC to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the ORC file and write Parquet in one query:
clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.parquet' FORMAT Parquet"
Both ORC and Parquet embed their schemas in a file footer, so clickhouse local reads the column names and types straight from the ORC file and carries them into Parquet with no import step first. The 20 above is the row count of the Parquet file written, confirming the conversion round-tripped.
Both formats are columnar, so types survive
ORC and Parquet are close cousins: both store data column-by-column with an embedded schema in a footer. That makes this conversion lossless for the common types. clickhouse-local reads the ORC schema first; check what it found with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.orc')"
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
event_type Nullable(String)
amount Nullable(Float64)
attrs Map(String, Nullable(String))
Run the same DESCRIBE on the Parquet output and you get an identical schema. The attrs column is a nested Map, and it stays a nested Map on the Parquet side, because Parquet has a native map type too. There's no flattening and no JSON-stringifying:
clickhouse local -q "SELECT event_date, event_id, country, event_type, amount, attrs FROM file('events.parquet') ORDER BY event_id LIMIT 5"
┌─event_date─┬─event_id─┬─country─┬─event_type─┬─amount─┬─attrs─────────────────────────────┐
1. │ 2026-01-01 │ 1 │ GB │ click │ 5 │ {'device':'mobile','plan':'free'} │
2. │ 2026-01-02 │ 2 │ US │ view │ 6.01 │ {'device':'desktop','plan':'pro'} │
3. │ 2026-01-03 │ 3 │ DE │ signup │ 7.02 │ {'device':'mobile','plan':'free'} │
4. │ 2026-01-04 │ 4 │ FR │ purchase │ 8.03 │ {'device':'desktop','plan':'pro'} │
5. │ 2026-01-05 │ 5 │ IN │ click │ 9.04 │ {'device':'mobile','plan':'free'} │
└────────────┴──────────┴─────────┴────────────┴────────┴───────────────────────────────────┘
One thing worth knowing: ORC carries Date columns as Date32 (a wider date range), and ClickHouse preserves that on read, so a date column lands as Date32 in the Parquet output rather than the narrower Date. If you want a specific target type, cast it in the SELECT (for example CAST(event_date AS Date) AS event_date). The FORMAT Parquet clause is optional when the output file ends in .parquet, since the extension is enough to pick the format; keep it when the name is ambiguous.
This is where running the conversion locally pays off. Online "ORC to Parquet" converters take whatever defaults they ship with, and they need you to upload the file first. Locally you control the column-store knobs and your data never leaves the machine.
The one that matters most is the compression codec. Parquet compresses per column; the codec changes the output size significantly. Pass it as a setting:
clickhouse local -q "SELECT * FROM file('events_large.orc') INTO OUTFILE 'events_large.parquet' FORMAT Parquet SETTINGS output_format_parquet_compression_method='zstd'"
On the 3,000,000-row sample file, the same data written three ways:
23566016 events_large.orc
22729944 events_large_zstd.parquet
33072661 events_large_snappy.parquet
47724146 events_large_none.parquet
zstd (the default in this build) gives the smallest file, snappy trades size for faster decompression, none is fastest to write and largest. Other useful settings include output_format_parquet_row_group_size to tune row-group granularity for your read patterns. None of this is reachable from a one-click upload site.
You can also read the Parquet footer back without touching the data, using the ParquetMetadata format:
clickhouse local -q "SELECT num_columns, num_rows, num_row_groups, format_version FROM file('events_large.parquet', ParquetMetadata) FORMAT Vertical"
num_columns: 7
num_rows: 3000000
num_row_groups: 3
format_version: 2
(num_columns is 7 because the nested Map is stored as separate key and value columns inside Parquet.)
Conversion is I/O- and codec-bound, and ClickHouse parallelises both. Converting the 3,000,000-row events_large.orc (~22 MB) to Parquet completes in:
~0.52 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number includes reading and decoding the ORC file and re-encoding every column as compressed Parquet from scratch.
If you work in Python, chDB is the same ClickHouse engine in-process. The conversion is the identical SQL, no server and no upload:
import chdb
chdb.query(
"SELECT * FROM file('events.orc') "
"INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
)
# verify
print(chdb.query("SELECT count() FROM file('events.parquet')", "CSV"), end="")
The compression setting works the same way — append SETTINGS output_format_parquet_compression_method='zstd' to the conversion query. To pull the result into a DataFrame instead of a file, see how to read a Parquet file in Python with chDB.
Going the other way is the mirror image: read the Parquet, write ORC. See how to convert Parquet to ORC. To learn the format itself, see what is the ORC file format and what is a Parquet file.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-parquet
The same SQL that converts one file on your laptop runs unchanged against a directory of files, against remote object storage, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.