To convert an ORC file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then write the converted file with INTO OUTFILE:
clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"
clickhouse local reads the schema from the ORC footer directly, so nothing has to be declared before the conversion starts. The file is read in place with no import step, and because the read streams row by row it handles inputs larger than RAM.
CSVWithNames writes a header row. Use CSV if you want the data with no header.
ORC is a self-describing columnar format: the schema lives in the file footer, so nothing has to be declared. Check what was detected with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.orc')"
event_time Nullable(DateTime64(9))
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
tags Map(String, Nullable(String))
The types are real ORC types, not guesses from sampling text. That matters for the conversion: an Int64 stays numeric, a DateTime64 stays a timestamp, and the values are written to CSV without re-inference.
The gotcha: nested columns have no CSV shape
CSV is flat. ORC is not. The example above has a tags column typed Map(String, String), and there is no native way to represent a map in a CSV cell. If you run the plain SELECT *, ClickHouse serializes the map as a quoted string literal:
"event_time","event_id","country","action","amount","tags"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"{'utm_source':'ads','device':'mobile'}"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"{'utm_source':'organic','device':'desktop'}"
That round-trips back into ClickHouse, but most CSV consumers (spreadsheets, pandas.read_csv, a downstream loader) will treat {'utm_source':'ads','device':'mobile'} as one opaque string. The fix is to decide the shape yourself in the SELECT. Flatten the map keys into their own scalar columns:
clickhouse local -q "
SELECT
event_time,
event_id,
country,
action,
amount,
tags['utm_source'] AS utm_source,
tags['device'] AS device
FROM file('events.orc')
INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames"
"event_time","event_id","country","action","amount","utm_source","device"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"
"2026-01-01 00:04:34.000000000",3,"DE","purchase",7.02,"email","mobile"
Now every column is a plain CSV value. The same approach applies to ORC List and Struct columns: pick the fields you want, or serialize the whole thing with toJSONString() if you'd rather keep it as one JSON string column.
This is where a one-line SELECT beats a generic upload-and-download converter: the conversion is a query, so you shape the output before it's written.
- Tidy the timestamps. ORC
DateTime64(9) carries nanosecond precision, so it lands in CSV as 2026-01-01 00:00:00.000000000. If you don't need sub-second precision, format it: formatDateTime(event_time, '%Y-%m-%d %H:%i:%S').
- Keep nested data as JSON.
toJSONString(tags) writes a valid JSON string per row instead of flattening, which is handy if a downstream tool parses JSON.
- Filter and project while you convert. Add a
WHERE, pick columns, ORDER BY; you don't have to convert the whole file. Converting only WHERE action = 'purchase' produces a smaller CSV directly.
- Change the delimiter. Want tab- or pipe-separated output instead? Swap the format to
TSVWithNames, or use CustomSeparated with a delimiter setting.
clickhouse local -q "
SELECT
formatDateTime(event_time, '%Y-%m-%d %H:%i:%S') AS event_time,
event_id, country, action, amount,
toJSONString(tags) AS tags_json
FROM file('events.orc')
ORDER BY event_id LIMIT 3
FORMAT CSVWithNames"
"event_time","event_id","country","action","amount","tags_json"
"2026-01-01 00:00:00",1,"GB","click",5,"{""utm_source"":""ads"",""device"":""mobile""}"
"2026-01-01 00:02:17",2,"US","view",6.01,"{""utm_source"":""organic"",""device"":""desktop""}"
"2026-01-01 00:04:34",3,"DE","purchase",7.02,"{""utm_source"":""email"",""device"":""mobile""}"
Converting a 3,000,000-row ORC file (events_large.orc) to a 221 MB CSV, flattening the map as above:
clickhouse local -q "
SELECT event_time, event_id, country, action, amount,
tags['utm_source'] AS utm_source, tags['device'] AS device
FROM file('events_large.orc')
INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"
~0.47 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift slightly under concurrent load, but the order of magnitude holds: this is a streaming columnar read writing text, not a load-then-export round trip. Because the read streams, the input ORC can be larger than memory.
Going the other way? See convert CSV to ORC — same one-line pattern, swapping the formats.
chDB is the same ClickHouse engine embedded in Python, so the conversion is the identical SQL. INTO OUTFILE writes the CSV from inside the process:
import chdb
chdb.query("""
SELECT
event_time, event_id, country, action, amount,
tags['utm_source'] AS utm_source,
tags['device'] AS device
FROM file('events.orc')
INTO OUTFILE 'events_chdb.csv' TRUNCATE FORMAT CSVWithNames
""")
"event_time","event_id","country","action","amount","utm_source","device"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"
If you'd rather read the ORC straight into a DataFrame instead of writing a file, see how to read an ORC file in Python with chDB.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-csv
The same SQL scales without a rewrite: from a file on your laptop, to a ClickHouse server, to ClickHouse Cloud when the data outgrows one machine. For background on the source format, see the ORC file format and how to query an ORC file; for the target side, how to run SQL on a CSV file.