To convert a Parquet file to Arrow, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the converted Arrow file back to confirm the round-trip:
clickhouse local -q "SELECT * FROM file('events.arrow') LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
The Parquet file is read in place with no import step: clickhouse local infers the schema from the Parquet footer and writes Arrow IPC bytes directly to disk, so a file larger than RAM converts without loading it all into memory at once.
Parquet already stores a typed schema, so there is nothing to infer from raw text and nothing to guess. The types ClickHouse reads from the Parquet footer are the types it writes into the Arrow file. Compare the two with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.parquet')"
clickhouse local -q "DESCRIBE file('events.arrow')"
event_date Date32
event_id UInt64
country String
action String
amount Float64
sessions UInt8
Identical on both sides. Note the columns are not Nullable: Parquet recorded them as required, and Arrow keeps that. (A column that was nullable in the Parquet schema stays nullable in Arrow.) This is the difference from converting out of CSV, where everything starts as text and types are inferred. Parquet to Arrow is lossless for the common numeric, string, date, and timestamp types because both formats descend from the same columnar type system.
Read the Arrow file straight back to confirm the values round-tripped:
clickhouse local -q "SELECT * FROM file('events.arrow') LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
This is the information that an upload-a-file converter site can't give you, because you control the writer:
-
.feather is the same format. Feather (v2) is the Arrow IPC file format. If a tool wants a .feather file, write the exact same bytes and name them .feather:
clickhouse local -q "SELECT * FROM file('events.parquet') INTO OUTFILE 'events.feather' FORMAT Arrow"
-
Arrow IPC file vs stream. FORMAT Arrow writes the random-access file format (with a footer). For a pipe or socket, FORMAT ArrowStream writes the streaming variant instead. Reading the stream back needs the format named explicitly: file('events.arrows', 'ArrowStream').
-
Arrow output is uncompressed by default. That is deliberate. Arrow is built for fast in-memory handoff, not for storage. On the 3,000,000-row sample, the Arrow file is larger than its Parquet source (68 MB vs 23 MB) because Parquet compresses on disk and uncompressed Arrow doesn't. If you want a smaller Arrow file, turn on block compression:
clickhouse local -q "
SELECT * FROM file('events_large.parquet')
INTO OUTFILE 'events_large.arrow'
FORMAT Arrow
SETTINGS output_format_arrow_compression_method='zstd'"
That takes the file from 68 MB to 39 MB and still reads as a normal Arrow file. lz4_frame is the other supported codec when you want decode speed over ratio.
If you do want long-term, well-compressed storage rather than an in-memory exchange buffer, keep the data as Parquet — see what is a Parquet file.
Parquet is the right format on disk: compressed, columnar, great for repeated scans. Arrow is the right format in memory: it's the zero-copy layout that pandas, pyarrow, and Polars read without re-parsing. Converting Parquet to Arrow is how you hand a dataset to an in-process analytics tool with no deserialization tax. You'd reach for the Arrow file when the consumer is a Python or R process that maps Arrow directly, and keep Parquet when the consumer reads from disk or object storage.
A conversion is only done when the row count and values match. Check both files agree:
clickhouse local -q "
SELECT
(SELECT count() FROM file('events_large.parquet')) AS parquet_rows,
(SELECT count() FROM file('events_large.arrow')) AS arrow_rows"
On the 3,000,000-row events_large.parquet (~23 MB Parquet in, ~68 MB Arrow out), the full conversion runs in:
clickhouse local -q "SELECT * FROM file('events_large.parquet') INTO OUTFILE 'events_large.arrow' TRUNCATE FORMAT Arrow"
~0.30 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). Because both sides are columnar, the work is mostly reading column chunks and re-laying them out as Arrow record batches; there is no row-by-row text parsing.
Going the other way is the same one line with the formats swapped. See how to convert Arrow to Parquet.
If you're already in a Python process, chDB runs the identical SQL in-process, with no separate binary. Use the same SELECT ... FORMAT Arrow and write the bytes to a file:
import chdb
arrow_bytes = chdb.query("SELECT * FROM file('events.parquet') FORMAT Arrow").bytes()
with open("events.arrow", "wb") as f:
f.write(arrow_bytes)
# Or skip the file entirely and get a DataFrame, since Arrow is the handoff:
df = chdb.query("SELECT * FROM file('events.parquet')", "DataFrame")
print(df.shape)
For reading Arrow back into a DataFrame, see how to read an Arrow file in Python with chDB.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Parquet files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB path, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-parquet-to-arrow
The same SQL that converts a local file scales unchanged to a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop.