To convert an Arrow file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then write the conversion with INTO OUTFILE:
clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"
"event_date","event_id","country","action","amount","tags"
"2026-01-01",1,"GB","click",5,"[0]"
"2026-01-02",2,"US","view",6.01,"[0,1]"
"2026-01-03",3,"DE","purchase",7.02,"[0,1,2]"
"2026-01-04",4,"FR","signup",8.03,"[0]"
FORMAT CSVWithNames writes a header row from the column names; TRUNCATE overwrites the output file if it already exists. Arrow files embed their own schema, so clickhouse local reads the column names and types straight out of the file with no import step.
Arrow IPC and Feather are the same format. A .feather or .arrow file reads the same way, with FORMAT Arrow.
Unlike CSV, an Arrow file embeds its schema. You don't infer or supply anything; clickhouse-local reads the names and types directly. Confirm what it sees with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.arrow')"
event_date Date32
event_id UInt64
country String
action String
amount Float64
tags Array(UInt16)
Those types come from the Arrow metadata, not from guessing. event_date stays a date, amount stays a float, event_id stays an unsigned integer. When the data lands in CSV the values are written as text, but the conversion started from real types rather than re-parsed strings.
The one thing to watch: nested columns flatten
This is the gotcha specific to Arrow. Arrow is a rich columnar format and can hold nested types: lists, structs, maps. CSV is flat text with no nested types. So a nested column has to be flattened on the way out.
The sample file has a tags column of type Array(UInt16). In the Arrow file it's a real list per row:
In the CSV it becomes a single quoted string per cell — "[0]", "[0,1]", "[0,1,2]". The data is preserved, but the structure is now a string literal that a downstream CSV reader will treat as text, not as an array. If you need those values as separate columns or rows, reshape them in the same SQL before writing. For example, arrayJoin(tags) to explode the list into one row per element, or pull individual fields out of a struct with dot access. Do the shaping while the data is still typed, then write CSV.
CSVWithNames writes the column names as the first line. If the tool reading the CSV doesn't want a header, use the plain CSV format instead:
clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events_noheader.csv' TRUNCATE FORMAT CSV"
"2026-01-01",1,"GB","click",5,"[0]"
"2026-01-02",2,"US","view",6.01,"[0,1]"
"2026-01-03",3,"DE","purchase",7.02,"[0,1,2]"
Prefer Python? The same conversion runs in-process with chDB (covered below), or read the Arrow file straight into a DataFrame with chDB in Python.
Online "Arrow to CSV" converters make you upload the file and hand back a fixed dump. Because this is SQL over the file, you can shape the output in the same command — that's the information gain.
Pick and rename columns, filter rows, aggregate, or change the delimiter, all during the conversion:
clickhouse local -q "
SELECT country, count() AS events, round(sum(amount),2) AS total
FROM file('events.arrow')
GROUP BY country
ORDER BY total DESC
INTO OUTFILE 'by_country.csv' TRUNCATE FORMAT CSVWithNames"
"country","events","total"
"US",4,60.4
"GB",4,56.36
"AU",3,48.33
"IN",3,45.3
"FR",3,42.27
"DE",3,39.24
Other knobs worth knowing:
- Semicolon or tab output: write
FORMAT CSV with SETTINGS format_csv_delimiter=';', or just use FORMAT TSV.
- Custom NULL text, custom quoting, and date formats are all
format_csv_* / date_time_output_format settings.
- The file stays local. Nothing is uploaded, which matters for anything sensitive.
If you work in Python, chDB is the same engine in-process. The SQL is identical:
import chdb
chdb.query(
"SELECT * FROM file('events.arrow') "
"INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"
)
Or skip the CSV file entirely and pull the Arrow data straight into a pandas DataFrame, shaping it in the same query:
df = chdb.query(
"SELECT country, count() AS events, round(sum(amount),2) AS total "
"FROM file('events.arrow') GROUP BY country ORDER BY total DESC",
"DataFrame",
)
country events total
US 4 60.40
GB 4 56.36
AU 3 48.33
IN 3 45.30
FR 3 42.27
DE 3 39.24
This is a fair alternative to reading the Arrow file with pandas or pyarrow directly; the difference is that you write SQL, and the same SQL runs on a single file or a server without a rewrite.
On a 3,000,000-row Arrow file (events_large.arrow, ~63 MB), the full conversion to a ~123 MB CSV completes in:
clickhouse local -q "SELECT * FROM file('events_large.arrow') INTO OUTFILE 'events_large.csv' TRUNCATE FORMAT CSVWithNames"
~0.23 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may move slightly under concurrent load. Reading Arrow is cheap because the layout is columnar and already typed; the work is mostly formatting the values as CSV text.
Going the other way, from CSV to Arrow, is the same pattern with the formats swapped: SELECT * FROM file('events.csv') INTO OUTFILE 'events.arrow' FORMAT Arrow. See convert CSV to Arrow for that side, including the type-inference notes that don't apply here.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Arrow files (including the ~63 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-arrow-to-csv
The same SQL scales from this one file to a ClickHouse server and ClickHouse Cloud when the data outgrows your laptop, with no rewrite.