How to convert Avro to CSV

Al Brown
Last updated: Jun 15, 2026

To convert an Avro file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the Avro file and write the result out as CSV:

1clickhouse local -q "SELECT * FROM file('events.avro') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"
1"event_id","ts","event_type","country","amount","user_info","tags"
21,1767225600,"login","GB",0,"user0",1,"['a']"
32,1767229200,"purchase","US",1.01,"user1",2,"['a','b']"
43,1767232800,"logout","DE",2.02,"user2",3,"['a','b','c']"

Avro embeds its own schema, so clickhouse local reads the column names and types directly from the file with no import step. The file streams through, so inputs larger than RAM convert fine.

What the Avro schema carries #

CSV has no types; every cell is text. Avro has a real schema. Read it with DESCRIBE before you convert, so you know what has to flatten:

1clickhouse local -q "DESCRIBE file('events.avro')"
1event_id	Int64
2ts	Int32
3event_type	String
4country	String
5amount	Float64
6user_info	Tuple(`1` String, `2` Int32)
7tags	Array(String)

Two things to notice. user_info is a Tuple (a nested record), and tags is an Array. CSV is a flat grid of scalar cells, so neither has a clean home there. With a plain SELECT *, the tuple expands into two unlabelled values in the row while the header still names a single user_info column, and the array is dumped as the literal text ['a','b']. Your column count no longer matches your header. That's the trap with any "Avro to CSV" tool that just streams fields through.

ts is also worth a glance: it came back as Int32. Avro stores timestamps as plain integers, so unless the producer tagged a logical type, you get the raw Unix seconds, not a date.

Flatten nested Avro into clean CSV #

Replace SELECT * with an explicit projection. Pull each tuple element into its own named column, join the array into a single string, and cast the integer back to a DateTime:

1clickhouse local -q "
2SELECT
3  event_id,
4  toDateTime(ts)               AS ts,
5  event_type,
6  country,
7  amount,
8  user_info.1                  AS user_id,
9  user_info.2                  AS sessions,
10  arrayStringConcat(tags, '|') AS tags
11FROM file('events.avro')
12INTO OUTFILE 'events_flat.csv' TRUNCATE FORMAT CSVWithNames"

Now the header and the rows line up, and every cell is a flat value:

1"event_id","ts","event_type","country","amount","user_id","sessions","tags"
21,"2026-01-01 00:00:00","login","GB",0,"user0",1,"a"
32,"2026-01-01 01:00:00","purchase","US",1.01,"user1",2,"a|b"
43,"2026-01-01 02:00:00","logout","DE",2.02,"user2",3,"a|b|c"
54,"2026-01-01 03:00:00","signup","FR",3.03,"user3",4,"a"
65,"2026-01-01 04:00:00","login","GB",4.04,"user4",5,"a|b"

user_info.1 and user_info.2 address the tuple by position; alias them to readable names. arrayStringConcat(tags, '|') collapses the array into one delimited cell (pick a separator that doesn't clash with your commas). toDateTime(ts) turns the raw integer into a readable timestamp. This is the difference between a CSV you can hand to a spreadsheet and one with misaligned columns.

Options worth knowing #

These are the choices a drag-and-drop converter never gives you:

  • Header or no header. CSVWithNames writes the header row. Use CSV for a headerless file.
  • Delimiter. Switch to TSV by changing the format: FORMAT TSVWithNames. For a custom separator, see reading custom-delimited files.
  • Pick columns and rows. It's just SQL. Add a WHERE to filter, a column list to drop fields, or ORDER BY. You convert and clean in the same pass instead of exporting everything and trimming later.
  • NULLs. Empty Avro values land as empty CSV cells by default; set format_csv_null_representation if you need a sentinel like \N.
  • Bigger than memory. The conversion streams, so a multi-GB Avro file converts without loading it all at once.

chDB: the same conversion in Python #

If you'd rather stay in Python, chDB is the same engine as a library. The SQL is identical — INTO OUTFILE ... FORMAT CSVWithNames writes the file from inside the query:

1import chdb
2
3chdb.query("""
4SELECT
5  event_id,
6  toDateTime(ts)               AS ts,
7  event_type,
8  country,
9  amount,
10  user_info.1                  AS user_id,
11  user_info.2                  AS sessions,
12  arrayStringConcat(tags, '|') AS tags
13FROM file('data/events.avro')
14INTO OUTFILE 'data/events_flat_chdb.csv' TRUNCATE FORMAT CSVWithNames
15""")

It produces a byte-for-byte match of the CLI output:

1"event_id","ts","event_type","country","amount","user_id","sessions","tags"
21,"2026-01-01 00:00:00","login","GB",0,"user0",1,"a"
32,"2026-01-01 01:00:00","purchase","US",1.01,"user1",2,"a|b"
43,"2026-01-01 02:00:00","logout","DE",2.02,"user2",3,"a|b|c"

How fast is it? #

On a 3,000,000-row Avro file (events_large.avro, ~81 MB on disk, with the same nested Tuple and Array), the full flatten-and-write conversion to a ~203 MB CSV completes in:

~1.07 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift slightly under concurrent load. That's decoding every Avro record, flattening the nested fields, and serialising 3M rows of CSV text, in a single pass with no import step.

Reverse direction? #

Going the other way, building an Avro file from a CSV, is the same idea in mirror image. See convert CSV to Avro, where you do get to add real types and a schema on the way in.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Avro files (including the ~81 MB file used for the timing), run.sh with every command on this page, a run.py and notebook for the chDB path, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-avro-to-csv

Online converters make you upload the file, can't flatten nested records correctly, and stop at the file in front of you. The same one line here is scriptable, keeps your data local, handles inputs larger than RAM, and the exact SQL runs unchanged against a directory of files, against a ClickHouse server, or against ClickHouse Cloud when the data outgrows your laptop. Related: read an Avro file and run SQL on a CSV file.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...