How to convert Arrow to CSV

Al Brown
Last updated: Jun 15, 2026

To convert an Arrow file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the conversion with INTO OUTFILE:

1clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"
1"event_date","event_id","country","action","amount","tags"
2"2026-01-01",1,"GB","click",5,"[0]"
3"2026-01-02",2,"US","view",6.01,"[0,1]"
4"2026-01-03",3,"DE","purchase",7.02,"[0,1,2]"
5"2026-01-04",4,"FR","signup",8.03,"[0]"

FORMAT CSVWithNames writes a header row from the column names; TRUNCATE overwrites the output file if it already exists. Arrow files embed their own schema, so clickhouse local reads the column names and types straight out of the file with no import step.

Arrow IPC and Feather are the same format. A .feather or .arrow file reads the same way, with FORMAT Arrow.

The schema comes from the Arrow file #

Unlike CSV, an Arrow file embeds its schema. You don't infer or supply anything; clickhouse-local reads the names and types directly. Confirm what it sees with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.arrow')"
1event_date	Date32
2event_id	UInt64
3country	String
4action	String
5amount	Float64
6tags	Array(UInt16)

Those types come from the Arrow metadata, not from guessing. event_date stays a date, amount stays a float, event_id stays an unsigned integer. When the data lands in CSV the values are written as text, but the conversion started from real types rather than re-parsed strings.

The one thing to watch: nested columns flatten #

This is the gotcha specific to Arrow. Arrow is a rich columnar format and can hold nested types: lists, structs, maps. CSV is flat text with no nested types. So a nested column has to be flattened on the way out.

The sample file has a tags column of type Array(UInt16). In the Arrow file it's a real list per row:

11	[0]
22	[0,1]
33	[0,1,2]

In the CSV it becomes a single quoted string per cell — "[0]", "[0,1]", "[0,1,2]". The data is preserved, but the structure is now a string literal that a downstream CSV reader will treat as text, not as an array. If you need those values as separate columns or rows, reshape them in the same SQL before writing. For example, arrayJoin(tags) to explode the list into one row per element, or pull individual fields out of a struct with dot access. Do the shaping while the data is still typed, then write CSV.

Header or no header #

CSVWithNames writes the column names as the first line. If the tool reading the CSV doesn't want a header, use the plain CSV format instead:

1clickhouse local -q "SELECT * FROM file('events.arrow') INTO OUTFILE 'events_noheader.csv' TRUNCATE FORMAT CSV"
1"2026-01-01",1,"GB","click",5,"[0]"
2"2026-01-02",2,"US","view",6.01,"[0,1]"
3"2026-01-03",3,"DE","purchase",7.02,"[0,1,2]"

Prefer Python? The same conversion runs in-process with chDB (covered below), or read the Arrow file straight into a DataFrame with chDB in Python.

Options an upload converter can't give you #

Online "Arrow to CSV" converters make you upload the file and hand back a fixed dump. Because this is SQL over the file, you can shape the output in the same command — that's the information gain.

Pick and rename columns, filter rows, aggregate, or change the delimiter, all during the conversion:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount),2) AS total
3FROM file('events.arrow')
4GROUP BY country
5ORDER BY total DESC
6INTO OUTFILE 'by_country.csv' TRUNCATE FORMAT CSVWithNames"
1"country","events","total"
2"US",4,60.4
3"GB",4,56.36
4"AU",3,48.33
5"IN",3,45.3
6"FR",3,42.27
7"DE",3,39.24

Other knobs worth knowing:

  • Semicolon or tab output: write FORMAT CSV with SETTINGS format_csv_delimiter=';', or just use FORMAT TSV.
  • Custom NULL text, custom quoting, and date formats are all format_csv_* / date_time_output_format settings.
  • The file stays local. Nothing is uploaded, which matters for anything sensitive.

chDB: the same conversion in Python #

If you work in Python, chDB is the same engine in-process. The SQL is identical:

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('events.arrow') "
5    "INTO OUTFILE 'events.csv' TRUNCATE FORMAT CSVWithNames"
6)

Or skip the CSV file entirely and pull the Arrow data straight into a pandas DataFrame, shaping it in the same query:

1df = chdb.query(
2    "SELECT country, count() AS events, round(sum(amount),2) AS total "
3    "FROM file('events.arrow') GROUP BY country ORDER BY total DESC",
4    "DataFrame",
5)
1country  events  total
2     US       4  60.40
3     GB       4  56.36
4     AU       3  48.33
5     IN       3  45.30
6     FR       3  42.27
7     DE       3  39.24

This is a fair alternative to reading the Arrow file with pandas or pyarrow directly; the difference is that you write SQL, and the same SQL runs on a single file or a server without a rewrite.

How fast is it? #

On a 3,000,000-row Arrow file (events_large.arrow, ~63 MB), the full conversion to a ~123 MB CSV completes in:

1clickhouse local -q "SELECT * FROM file('events_large.arrow') INTO OUTFILE 'events_large.csv' TRUNCATE FORMAT CSVWithNames"

~0.23 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may move slightly under concurrent load. Reading Arrow is cheap because the layout is columnar and already typed; the work is mostly formatting the values as CSV text.

Reverse direction? #

Going the other way, from CSV to Arrow, is the same pattern with the formats swapped: SELECT * FROM file('events.csv') INTO OUTFILE 'events.arrow' FORMAT Arrow. See convert CSV to Arrow for that side, including the type-inference notes that don't apply here.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Arrow files (including the ~63 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-arrow-to-csv

The same SQL scales from this one file to a ClickHouse server and ClickHouse Cloud when the data outgrows your laptop, with no rewrite.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...