How to convert ORC to CSV

Al Brown
Last updated: Jun 6, 2026

To convert an ORC file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the converted file with INTO OUTFILE:

1clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

clickhouse local reads the schema from the ORC footer directly, so nothing has to be declared before the conversion starts. The file is read in place with no import step, and because the read streams row by row it handles inputs larger than RAM.

CSVWithNames writes a header row. Use CSV if you want the data with no header.

What ClickHouse reads from the ORC file #

ORC is a self-describing columnar format: the schema lives in the file footer, so nothing has to be declared. Check what was detected with DESCRIBE:

1clickhouse local -q "DESCRIBE file('events.orc')"
1event_time	Nullable(DateTime64(9))
2event_id	Nullable(Int64)
3country	Nullable(String)
4action	Nullable(String)
5amount	Nullable(Float64)
6tags	Map(String, Nullable(String))

The types are real ORC types, not guesses from sampling text. That matters for the conversion: an Int64 stays numeric, a DateTime64 stays a timestamp, and the values are written to CSV without re-inference.

The gotcha: nested columns have no CSV shape #

CSV is flat. ORC is not. The example above has a tags column typed Map(String, String), and there is no native way to represent a map in a CSV cell. If you run the plain SELECT *, ClickHouse serializes the map as a quoted string literal:

1"event_time","event_id","country","action","amount","tags"
2"2026-01-01 00:00:00.000000000",1,"GB","click",5,"{'utm_source':'ads','device':'mobile'}"
3"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"{'utm_source':'organic','device':'desktop'}"

That round-trips back into ClickHouse, but most CSV consumers (spreadsheets, pandas.read_csv, a downstream loader) will treat {'utm_source':'ads','device':'mobile'} as one opaque string. The fix is to decide the shape yourself in the SELECT. Flatten the map keys into their own scalar columns:

1clickhouse local -q "
2SELECT
3  event_time,
4  event_id,
5  country,
6  action,
7  amount,
8  tags['utm_source'] AS utm_source,
9  tags['device']     AS device
10FROM file('events.orc')
11INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames"
1"event_time","event_id","country","action","amount","utm_source","device"
2"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
3"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"
4"2026-01-01 00:04:34.000000000",3,"DE","purchase",7.02,"email","mobile"

Now every column is a plain CSV value. The same approach applies to ORC List and Struct columns: pick the fields you want, or serialize the whole thing with toJSONString() if you'd rather keep it as one JSON string column.

Options worth knowing #

This is where a one-line SELECT beats a generic upload-and-download converter: the conversion is a query, so you shape the output before it's written.

  • Tidy the timestamps. ORC DateTime64(9) carries nanosecond precision, so it lands in CSV as 2026-01-01 00:00:00.000000000. If you don't need sub-second precision, format it: formatDateTime(event_time, '%Y-%m-%d %H:%i:%S').
  • Keep nested data as JSON. toJSONString(tags) writes a valid JSON string per row instead of flattening, which is handy if a downstream tool parses JSON.
  • Filter and project while you convert. Add a WHERE, pick columns, ORDER BY; you don't have to convert the whole file. Converting only WHERE action = 'purchase' produces a smaller CSV directly.
  • Change the delimiter. Want tab- or pipe-separated output instead? Swap the format to TSVWithNames, or use CustomSeparated with a delimiter setting.
1clickhouse local -q "
2SELECT
3  formatDateTime(event_time, '%Y-%m-%d %H:%i:%S') AS event_time,
4  event_id, country, action, amount,
5  toJSONString(tags) AS tags_json
6FROM file('events.orc')
7ORDER BY event_id LIMIT 3
8FORMAT CSVWithNames"
1"event_time","event_id","country","action","amount","tags_json"
2"2026-01-01 00:00:00",1,"GB","click",5,"{""utm_source"":""ads"",""device"":""mobile""}"
3"2026-01-01 00:02:17",2,"US","view",6.01,"{""utm_source"":""organic"",""device"":""desktop""}"
4"2026-01-01 00:04:34",3,"DE","purchase",7.02,"{""utm_source"":""email"",""device"":""mobile""}"

How fast is it? #

Converting a 3,000,000-row ORC file (events_large.orc) to a 221 MB CSV, flattening the map as above:

1clickhouse local -q "
2SELECT event_time, event_id, country, action, amount,
3  tags['utm_source'] AS utm_source, tags['device'] AS device
4FROM file('events_large.orc')
5INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"

~0.47 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift slightly under concurrent load, but the order of magnitude holds: this is a streaming columnar read writing text, not a load-then-export round trip. Because the read streams, the input ORC can be larger than memory.

Reverse direction #

Going the other way? See convert CSV to ORC — same one-line pattern, swapping the formats.

Prefer Python? (chDB) #

chDB is the same ClickHouse engine embedded in Python, so the conversion is the identical SQL. INTO OUTFILE writes the CSV from inside the process:

1import chdb
2
3chdb.query("""
4SELECT
5  event_time, event_id, country, action, amount,
6  tags['utm_source'] AS utm_source,
7  tags['device']     AS device
8FROM file('events.orc')
9INTO OUTFILE 'events_chdb.csv' TRUNCATE FORMAT CSVWithNames
10""")
1"event_time","event_id","country","action","amount","utm_source","device"
2"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
3"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"

If you'd rather read the ORC straight into a DataFrame instead of writing a file, see how to read an ORC file in Python with chDB.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-csv

The same SQL scales without a rewrite: from a file on your laptop, to a ClickHouse server, to ClickHouse Cloud when the data outgrows one machine. For background on the source format, see the ORC file format and how to query an ORC file; for the target side, how to run SQL on a CSV file.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...