To convert a CSV file to Avro, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then write the result with FORMAT Avro:
clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.avro' FORMAT Avro"
2026-01-20 20 US signup 24.19 5
2026-01-19 19 GB purchase 23.18 4
2026-01-18 18 AU view 22.17 3
2026-01-17 17 IN click 21.16 2
2026-01-16 16 FR signup 20.15 1
clickhouse local reads the CSV header for column names, infers each column's type from the data, and builds the Avro record schema from those types. The CSV is read in place with no import step, and the file streams row by row so conversions larger than RAM complete without issue.
The interesting part of CSV to Avro is the schema. CSV has none; Avro requires one. clickhouse-local fills the gap by inferring types from the CSV, then translating them into Avro field types. Look at what it inferred from the CSV:
clickhouse local -q "DESCRIBE file('events.csv')"
event_date Nullable(Date)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
items Nullable(Int64)
Avro carries its schema in the file header as JSON. The header from events.avro shows the direct mapping:
{"type":"record","name":"row","fields":[{"name":"event_date","type":["null",{"type":"int","logicalType":"date"}]},{"name":"event_id","type":["null","long"]},{"name":"country","type":["null","string"]},{"name":"action","type":["null","string"]},{"name":"amount","type":["null","double"]},{"name":"items","type":["null","long"]}]}
Int64 became Avro long, Float64 became double, String became string, and Date became an int with logicalType: date (Avro stores dates as a day count). Because CSV inference marks every column Nullable, each Avro field is a union of ["null", T]. That is correct but verbose, and it forces every consumer to handle nulls.
Prefer Python? See how to read an Avro file in Python with chDB. The chDB conversion block is at the end of this page.
If you know the data has no missing values, declare the types yourself. Pass the format and an explicit, non-Nullable schema as the second and third arguments to file(). The Avro fields then drop the null unions:
clickhouse local -q "
SELECT * FROM file('events.csv', 'CSVWithNames',
'event_date Date, event_id UInt32, country String, action String, amount Float64, items UInt8')
INTO OUTFILE 'events_typed.avro' TRUNCATE FORMAT Avro"
The embedded schema is now plain scalars, no unions:
{"type":"record","name":"row","fields":[{"name":"event_date","type":{"type":"int","logicalType":"date"}},{"name":"event_id","type":"int"},{"name":"country","type":"string"},{"name":"action","type":"string"},{"name":"amount","type":"double"},{"name":"items","type":"int"}]}
This is the lever an upload-based converter never gives you: you control the exact Avro types from the source CSV, including the integer width and whether a field is nullable. UInt32 and UInt8 collapse to Avro int (Avro has no unsigned integers), so pick a signed type that fits your range if the distinction matters downstream.
Read the Avro file straight back to confirm the data and types survived:
clickhouse local -q "SELECT * FROM file('events.avro') ORDER BY amount DESC LIMIT 5"
2026-01-20 20 US signup 24.19 5
2026-01-19 19 GB purchase 23.18 4
2026-01-18 18 AU view 22.17 3
2026-01-17 17 IN click 21.16 2
2026-01-16 16 FR signup 20.15 1
These are the choices that matter when the Avro file leaves your machine and lands in another system.
-
Compression codec. Avro compresses each data block. clickhouse-local defaults to snappy (fast, widely supported). Switch to deflate for smaller files when read speed matters less:
clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.avro' TRUNCATE FORMAT Avro SETTINGS output_format_avro_codec='deflate'"
-
Nullability. Decide it deliberately. Inferred Nullable columns produce ["null", T] unions; an explicit non-Nullable schema produces plain types, as shown above.
-
Type lossiness. A flat CSV maps cleanly to Avro. There is nothing nested to flatten here, unlike going the other direction from a deeply nested format. The only quiet cast is unsigned to signed integers.
-
Compressed input. A .csv.gz or .csv.zst source is decompressed transparently. The same command works; the extension is detected automatically.
Online CSV-to-Avro converters own a lot of these searches, and they are fine for a one-off small file. The reasons to do it locally: nothing leaves your machine, you control the exact Avro types, it scripts into a pipeline, and it handles files bigger than memory.
On a 3,000,000-row, ~120 MB CSV (events_large.csv), the full conversion to Avro, parsing the CSV text and encoding the Avro blocks, completes in:
clickhouse local -q "SELECT * FROM file('events_large.csv') INTO OUTFILE 'events_large.avro' TRUNCATE FORMAT Avro"
~0.73 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number can move a little under concurrent load, but for a CSV this size the conversion is bound by parsing, not by Avro encoding. The 3,000,000 rows round-trip exactly when read back.
If you work in Python, chDB is the same ClickHouse engine in-process. Run the same SELECT ... FORMAT Avro and write the bytes to a file:
import chdb
avro_bytes = chdb.query(
"SELECT * FROM file('events.csv', 'CSVWithNames') FORMAT Avro"
).bytes()
with open("events.avro", "wb") as f:
f.write(avro_bytes)
The schema is derived from the inferred types exactly as with the CLI. This is a drop-in alternative to writing Avro by hand with pyarrow or fastavro when your input is already a CSV.
Going the other way is the mirror command, swap the formats. See how to convert Avro to CSV. To learn the format itself, see what is an Avro file, and to query Avro in place without converting, see how to read an Avro file.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~120 MB file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-avro
The same SQL that converts a file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it.