To convert Avro to Parquet, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then read the converted result back to confirm it worked:
clickhouse local -q "SELECT * FROM file('events.parquet') ORDER BY event_id LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
clickhouse local reads the schema embedded in the Avro file directly, so you never write a CREATE TABLE or define a structure argument. The Avro file is read in place and the Parquet file is written in one streaming pass, with no import step and no upload.
Avro and Parquet solve different problems. Avro is row-oriented: each record is stored together, which is good for streaming and writing one event at a time. Parquet is columnar: values for one column are stored together, which is what makes analytical scans and column pruning fast. Converting Avro to Parquet is a layout change from row to column, not just a rename.
The useful part is that Avro stores its schema inside the file. clickhouse-local reads that schema directly, so you never write a CREATE TABLE or pass a structure argument. Check what it found:
clickhouse local -q "DESCRIBE file('events.parquet')"
event_date Date32
event_id Int64
country String
action String
amount Float64
items Int32
Those types came from the Avro file and were carried straight into Parquet. Read the result back to confirm the data round-tripped:
clickhouse local -q "SELECT * FROM file('events.parquet') ORDER BY event_id LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
Online converters take a file, run it on someone else's machine, and hand you a single result. Running the conversion locally gives you control over the things that actually matter for Parquet, and your data never leaves the machine.
One unsigned-type caveat. Avro has no unsigned integer types. A column that started life as UInt8 in ClickHouse is stored as a regular signed integer in Avro, so it comes back as Int32 after a round trip through Avro (see items above). That is an Avro property, not a conversion bug. If you need the narrower or unsigned type back, cast it on the way out: SELECT * EXCEPT items, items::UInt8 AS items FROM file('events.avro').
Choose the compression codec. Parquet compresses per column. clickhouse-local defaults to zstd; you can pick another with a setting. On the 3,000,000-row sample file the choice is visible:
clickhouse local -q "
SELECT * FROM file('events_large.avro')
INTO OUTFILE 'events_large.parquet' TRUNCATE FORMAT Parquet
SETTINGS output_format_parquet_compression_method = 'zstd'"
none 45054899 bytes
snappy 32584563 bytes
zstd 22939538 bytes
zstd gives the smallest file here; snappy decompresses a little faster if read speed matters more than size. You can also set output_format_parquet_row_group_size to tune row-group granularity for your readers.
Inspect what you wrote. The ParquetMetadata format reads the footer without scanning the data, so you can confirm the row count, row groups, and format version:
clickhouse local -q "
SELECT num_rows, num_row_groups, format_version
FROM file('events.parquet', ParquetMetadata) FORMAT Vertical"
Row 1:
──────
num_rows: 20
num_row_groups: 1
format_version: 2
On the 3,000,000-row sample (events_large.avro, 44 MB), converting the whole file to Parquet takes ~0.61 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The 44 MB row-oriented Avro file becomes a 22 MB columnar Parquet file. The conversion reads every row, re-lays it out by column, and compresses it, in one streaming pass.
If you work in Python, chDB is the same ClickHouse engine in-process. The conversion is the identical SQL, no server and no subprocess:
import chdb
chdb.query("""
SELECT * FROM file('events.avro')
INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet
""")
# read it back into a pandas DataFrame
df = chdb.query("SELECT * FROM file('events.parquet') ORDER BY event_id LIMIT 5", "DataFrame")
print(df)
event_date event_id country action amount items
0 2026-01-01 1 GB click 5.00 1
1 2026-01-02 2 US view 6.01 2
2 2026-01-03 3 DE signup 7.02 3
3 2026-01-04 4 FR purchase 8.03 4
4 2026-01-05 5 IN click 9.04 5
This is a clean alternative to reading the Avro into pandas with a separate Avro library and then writing Parquet through pyarrow: one engine, one schema inference step, and it streams files larger than memory.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Avro files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-avro-to-parquet
The same SQL that converts one file on your laptop runs unchanged against a directory of files, a remote object store, or a ClickHouse server and ClickHouse Cloud when the data outgrows your machine.