How to convert Avro to JSON

Al Brown
Last updated: Jun 15, 2026

To convert Avro to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file and write it out as JSON Lines:

1clickhouse local -q "SELECT * FROM file('events.avro') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
1{"event_id":1,"event_type":"login","country":"GB","amount":0,"ts":1767225600}
2{"event_id":2,"event_type":"purchase","country":"US","amount":0.01,"ts":1767225601}
3{"event_id":3,"event_type":"logout","country":"DE","amount":0.02,"ts":1767225602}
4{"event_id":4,"event_type":"signup","country":"FR","amount":0.03,"ts":1767225603}
5{"event_id":5,"event_type":"login","country":"IN","amount":0.04,"ts":1767225604}

Avro embeds its own schema in the file header, so clickhouse local reads the column names and types from the file itself with no structure argument. The file is converted in place, with no upload and no import step first.

The schema comes from the Avro file #

Avro is self-describing: the writer's schema is stored in the file header. clickhouse-local reads it, so you never supply column names or types by hand. Confirm what was read with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.avro')"
1event_id	Int64
2event_type	String
3country	String
4amount	Float64
5ts	Int32

Those types are then carried into the JSON output: event_id and ts are numbers, event_type and country are strings, amount is a float. No quoting guesswork the way a hand-rolled text converter would do.

The one type gotcha: timestamps arrive as integers #

Look again at ts above. It came back as Int32, and in the first JSON output it showed up as 1767225600, not a readable timestamp. That is not a bug. Avro stores timestamps as a logical type over an integer (seconds or milliseconds since the Unix epoch), and the integer is what lands in JSON by default.

If you want a human-readable timestamp in the JSON, cast it during the SELECT:

1clickhouse local -q "
2SELECT event_id, event_type, country, amount, ts::DateTime AS ts
3FROM file('events.avro')
4INTO OUTFILE 'events_typed.jsonl' TRUNCATE FORMAT JSONEachRow"
1{"event_id":1,"event_type":"login","country":"GB","amount":0,"ts":"2026-01-01 00:00:00"}
2{"event_id":2,"event_type":"purchase","country":"US","amount":0.01,"ts":"2026-01-01 00:00:01"}
3{"event_id":3,"event_type":"logout","country":"DE","amount":0.02,"ts":"2026-01-01 00:00:02"}
4{"event_id":4,"event_type":"signup","country":"FR","amount":0.03,"ts":"2026-01-01 00:00:03"}
5{"event_id":5,"event_type":"login","country":"IN","amount":0.04,"ts":"2026-01-01 00:00:04"}

ts::DateTime is shorthand for CAST(ts AS DateTime). Use ::DateTime64(3) if your Avro timestamps are millisecond precision. This is the kind of control you do not get from a generic upload converter, which can only guess.

Options worth knowing #

You rarely want a blind, whole-file dump. Because you convert with SQL, you can shape the output as you go:

  • Single JSON array instead of JSONL. Use FORMAT JSON. This wraps the rows in a data array and adds a meta block with the column types:
1clickhouse local -q "SELECT event_id, event_type, amount FROM file('events.avro') LIMIT 3 FORMAT JSON"
1{
2	"meta":
3	[
4		{ "name": "event_id", "type": "Int64" },
5		{ "name": "event_type", "type": "String" },
6		{ "name": "amount", "type": "Float64" }
7	],
8	"data":
9	[
10		{ "event_id": 1, "event_type": "login", "amount": 0 },
11		{ "event_id": 2, "event_type": "purchase", "amount": 0.01 },
12		{ "event_id": 3, "event_type": "logout", "amount": 0.02 }
13	],
14	"rows": 3
15}
  • Pick and rename columns. Select only the fields you need; the JSON keys follow your SELECT aliases.
  • Filter and aggregate before writing. Convert and reduce in one pass:
1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount), 2) AS total
3FROM file('events.avro')
4WHERE event_type = 'purchase'
5GROUP BY country
6ORDER BY total DESC
7FORMAT JSONEachRow"
1{"country":"AU","events":2,"total":0.22}
2{"country":"US","events":2,"total":0.14}
3{"country":"FR","events":1,"total":0.09}
  • Gzip the output in place. A .gz, .zst, .lz4, or .xz suffix on the output file is detected from the name and the JSON is compressed as it is written, no extra step:
1clickhouse local -q "SELECT event_id, event_type, ts::DateTime AS ts FROM file('events.avro') INTO OUTFILE 'events.jsonl.gz' TRUNCATE FORMAT JSONEachRow"

Upload-required converter sites can do none of this. The wins here are concrete: nothing leaves your machine, the types come straight from the Avro schema, the command is scriptable in a pipeline, and files larger than RAM stream through instead of failing to upload.

How fast is it? #

On a 3,000,000-row events_large.avro, converting every row to JSON Lines (with the ts::DateTime cast) completes in:

1clickhouse local -q "
2SELECT event_id, event_type, country, amount, ts::DateTime AS ts
3FROM file('events_large.avro')
4INTO OUTFILE 'events_large.jsonl' TRUNCATE FORMAT JSONEachRow"

~0.55 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That decodes the Avro, casts the timestamps, and serializes 3,000,000 JSON objects. The measurement may move slightly under concurrent load, but the order of magnitude holds.

The same conversion in Python with chDB #

If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical, including the ::DateTime cast; you just write the bytes to a file:

1import chdb
2
3sql = """
4SELECT event_id, event_type, country, amount, ts::DateTime AS ts
5FROM file('events.avro')
6FORMAT JSONEachRow
7"""
8jsonl = chdb.query(sql).bytes()
9with open("events_chdb.jsonl", "wb") as f:
10    f.write(jsonl)
1{"event_id":1,"event_type":"login","country":"GB","amount":0,"ts":"2026-01-01 00:00:00"}
2{"event_id":2,"event_type":"purchase","country":"US","amount":0.01,"ts":"2026-01-01 00:00:01"}
3{"event_id":3,"event_type":"logout","country":"DE","amount":0.02,"ts":"2026-01-01 00:00:02"}

Same engine, same result, no subprocess.

Other Avro conversions #

JSON is text and convenient, but it is larger and slower to re-read than a columnar format. If the destination is analytics rather than a JSON consumer, convert Avro to Parquet instead with convert Avro to Parquet, or to a flat table with convert Avro to CSV. To query the Avro file in place without converting it at all, see how to read an Avro file and what is an Avro file.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample Avro files (including the 3,000,000-row file used for the timing above), run.sh with every command on this page, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-avro-to-json

The same SQL that converts one file on your laptop runs unchanged against a directory of files, a remote object store, or a ClickHouse server and ClickHouse Cloud when the data outgrows your machine.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...