To convert Arrow to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the Arrow file and write Parquet out:
clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events.parquet' FORMAT Parquet"
The Arrow file is read in place with no import step. ClickHouse carries the schema embedded in the Arrow file straight into Parquet, so types are preserved exactly and the row count comes back as confirmation.
Arrow IPC ships in two framings, and they are not interchangeable. The file format (what people often call Feather, and what ClickHouse writes with FORMAT Arrow) has a footer and a magic header. The streaming format (FORMAT ArrowStream, and what pyarrow's new_stream writes) does not. Both commonly use the .arrow extension, so you can't tell them apart by name.
clickhouse-local defaults to the file framing. Hand it a stream-framed .arrow and it fails fast:
-- this FAILS: a stream file is not an Arrow 'file' --
Code: 636. DB::Exception: The table structure cannot be extracted from a Arrow format file. Error:
Code: 1002. DB::Exception: Error while opening a table: Invalid: Not an Arrow file. (UNKNOWN_EXCEPTION)
The fix is to name the framing explicitly as the second argument to file():
clickhouse local -q "SELECT * FROM file('events_stream.arrow', 'ArrowStream') INTO OUTFILE 'events.parquet' FORMAT Parquet"
If a conversion errors with "Not an Arrow file", you have a stream, not a file. Switch to ArrowStream and it works.
Arrow is typed and columnar, so there's no header-sniffing or text re-parsing the way there is with CSV. The schema is embedded in the file, and ClickHouse carries those types straight into Parquet. Check both ends with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.arrow')"
clickhouse local -q "DESCRIBE file('events.parquet')"
event_date Date32
event_id UInt32
event_type String
country String
amount Float64
is_member UInt8
Identical on both sides. No Nullable wrapper gets bolted on, integers stay integers, dates stay dates. One detail worth knowing: ClickHouse maps a DateTime to a 32-bit epoch integer when it writes Arrow, so if your Arrow file came from ClickHouse a timestamp may already be an integer column. Date columns round-trip cleanly as shown above.
You don't have to trust the conversion blind. The ParquetMetadata format reads the footer, so you can confirm the row count, row groups, and per-column compression:
clickhouse local -q "SELECT num_rows, num_columns, num_row_groups, format_version FROM file('events.parquet', ParquetMetadata) FORMAT Vertical"
Row 1:
──────
num_rows: 20
num_columns: 6
num_row_groups: 1
format_version: 2
ClickHouse compresses Parquet with ZSTD by default. You can confirm that, column by column:
clickhouse local -q "SELECT tupleElement(arrayJoin(columns), 'name') AS column, tupleElement(arrayJoin(columns), 'compression') AS compression FROM file('events.parquet', ParquetMetadata)"
event_date ZSTD
event_id ZSTD
event_type ZSTD
country ZSTD
amount ZSTD
is_member ZSTD
Online "Arrow to Parquet" converters make you upload your data and hand back whatever defaults they picked. Running the conversion locally means you control it, and your data never leaves the machine.
Pick the codec. ZSTD is the default; pass output_format_parquet_compression_method to choose another (snappy, lz4, gzip, brotli, or none):
clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events_snappy.parquet' TRUNCATE FORMAT Parquet SETTINGS output_format_parquet_compression_method='snappy'"
Tune the row group size with output_format_parquet_row_group_size. And because the source is just a table, you can convert and transform in the same pass. Filter rows, drop columns, or aggregate before writing, instead of converting first and cleaning up later:
clickhouse local -q "
SELECT event_type, count() AS events, round(sum(amount), 2) AS amount
FROM file('events.arrow')
GROUP BY event_type
ORDER BY amount DESC"
logout 4 0.46
refund 4 0.42
purchase 4 0.38
login 4 0.34
signup 4 0.3
On a 3,000,000-row Arrow IPC file (events_large.arrow, ~46 MB), the full conversion to Parquet runs in:
clickhouse local -q "SELECT * FROM file('events_large.arrow') INTO OUTFILE 'events_large.parquet' TRUNCATE FORMAT Parquet"
~0.24 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The ~46 MB Arrow input came out as a ~15 MB ZSTD Parquet file. Arrow and Parquet are both columnar, so the conversion is mostly re-encoding column chunks rather than reshaping data, which is why it's quick.
If you're already in a Python session, chDB is the same engine in-process, with the same SQL and no separate binary. Write the Parquet with the identical SELECT ... INTO OUTFILE:
import chdb
chdb.query(
"SELECT * FROM file('events.arrow') "
"INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
)
print(chdb.query("SELECT count() FROM file('events.parquet')", "CSV"))
# 20
The ArrowStream rule is the same here: a stream-framed file needs file('events_stream.arrow', 'ArrowStream'). If you'd rather load the Arrow data into a DataFrame instead, see how to read an Arrow file in Python.
Going the other way is the same one-liner with the formats swapped. See how to convert Parquet to Arrow.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Arrow files (including the ~46 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-arrow-to-parquet
Arrow IPC and Feather are the same format under two names, so the same conversion reads a Feather file too. Once you have Parquet, see how to query a Parquet file, what a Parquet file is, or query Parquet on S3. clickhouse-local runs the same SQL unchanged across dozens of formats, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite.