How to convert ORC to CSV

Al Brown
Last updated: Jun 6, 2026

To convert an ORC file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

curl https://clickhouse.com/cli | sh   # install clickhousectl
clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then write the converted file with INTO OUTFILE:

clickhouse local -q "SELECT * FROM file('events.orc') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"

clickhouse local reads the schema from the ORC footer directly, so nothing has to be declared before the conversion starts. The file is read in place with no import step, and because the read streams row by row it handles inputs larger than RAM.

CSVWithNames writes a header row. Use CSV if you want the data with no header.

What ClickHouse reads from the ORC file

ORC is a self-describing columnar format: the schema lives in the file footer, so nothing has to be declared. Check what was detected with DESCRIBE:

clickhouse local -q "DESCRIBE file('events.orc')"
event_time	Nullable(DateTime64(9))
event_id	Nullable(Int64)
country	Nullable(String)
action	Nullable(String)
amount	Nullable(Float64)
tags	Map(String, Nullable(String))

The types are real ORC types, not guesses from sampling text. That matters for the conversion: an Int64 stays numeric, a DateTime64 stays a timestamp, and the values are written to CSV without re-inference.

The gotcha: nested columns have no CSV shape

CSV is flat. ORC is not. The example above has a tags column typed Map(String, String), and there is no native way to represent a map in a CSV cell. If you run the plain SELECT *, ClickHouse serializes the map as a quoted string literal:

"event_time","event_id","country","action","amount","tags"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"{'utm_source':'ads','device':'mobile'}"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"{'utm_source':'organic','device':'desktop'}"

That round-trips back into ClickHouse, but most CSV consumers (spreadsheets, pandas.read_csv, a downstream loader) will treat {'utm_source':'ads','device':'mobile'} as one opaque string. The fix is to decide the shape yourself in the SELECT. Flatten the map keys into their own scalar columns:

clickhouse local -q "
SELECT
  event_time,
  event_id,
  country,
  action,
  amount,
  tags['utm_source'] AS utm_source,
  tags['device']     AS device
FROM file('events.orc')
INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames"
"event_time","event_id","country","action","amount","utm_source","device"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"
"2026-01-01 00:04:34.000000000",3,"DE","purchase",7.02,"email","mobile"

Now every column is a plain CSV value. The same approach applies to ORC List and Struct columns: pick the fields you want, or serialize the whole thing with toJSONString() if you'd rather keep it as one JSON string column.

Options worth knowing

This is where a one-line SELECT beats a generic upload-and-download converter: the conversion is a query, so you shape the output before it's written.

  • Tidy the timestamps. ORC DateTime64(9) carries nanosecond precision, so it lands in CSV as 2026-01-01 00:00:00.000000000. If you don't need sub-second precision, format it: formatDateTime(event_time, '%Y-%m-%d %H:%i:%S').
  • Keep nested data as JSON. toJSONString(tags) writes a valid JSON string per row instead of flattening, which is handy if a downstream tool parses JSON.
  • Filter and project while you convert. Add a WHERE, pick columns, ORDER BY; you don't have to convert the whole file. Converting only WHERE action = 'purchase' produces a smaller CSV directly.
  • Change the delimiter. Want tab- or pipe-separated output instead? Swap the format to TSVWithNames, or use CustomSeparated with a delimiter setting.
clickhouse local -q "
SELECT
  formatDateTime(event_time, '%Y-%m-%d %H:%i:%S') AS event_time,
  event_id, country, action, amount,
  toJSONString(tags) AS tags_json
FROM file('events.orc')
ORDER BY event_id LIMIT 3
FORMAT CSVWithNames"
"event_time","event_id","country","action","amount","tags_json"
"2026-01-01 00:00:00",1,"GB","click",5,"{""utm_source"":""ads"",""device"":""mobile""}"
"2026-01-01 00:02:17",2,"US","view",6.01,"{""utm_source"":""organic"",""device"":""desktop""}"
"2026-01-01 00:04:34",3,"DE","purchase",7.02,"{""utm_source"":""email"",""device"":""mobile""}"

How fast is it?

Converting a 3,000,000-row ORC file (events_large.orc) to a 221 MB CSV, flattening the map as above:

clickhouse local -q "
SELECT event_time, event_id, country, action, amount,
  tags['utm_source'] AS utm_source, tags['device'] AS device
FROM file('events_large.orc')
INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"

~0.47 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number may shift slightly under concurrent load, but the order of magnitude holds: this is a streaming columnar read writing text, not a load-then-export round trip. Because the read streams, the input ORC can be larger than memory.

Reverse direction

Going the other way? See convert CSV to ORC — same one-line pattern, swapping the formats.

Prefer Python? (chDB)

chDB is the same ClickHouse engine embedded in Python, so the conversion is the identical SQL. INTO OUTFILE writes the CSV from inside the process:

import chdb

chdb.query("""
SELECT
  event_time, event_id, country, action, amount,
  tags['utm_source'] AS utm_source,
  tags['device']     AS device
FROM file('events.orc')
INTO OUTFILE 'events_chdb.csv' TRUNCATE FORMAT CSVWithNames
""")
"event_time","event_id","country","action","amount","utm_source","device"
"2026-01-01 00:00:00.000000000",1,"GB","click",5,"ads","mobile"
"2026-01-01 00:02:17.000000000",2,"US","view",6.01,"organic","desktop"

If you'd rather read the ORC straight into a DataFrame instead of writing a file, see how to read an ORC file in Python with chDB.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample ORC files (including the 3M-row file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-orc-to-csv

The same SQL scales without a rewrite: from a file on your laptop, to a ClickHouse server, to ClickHouse Cloud when the data outgrows one machine. For background on the source format, see the ORC file format and how to query an ORC file; for the target side, how to run SQL on a CSV file.


Share this resource

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

More like this

Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

Continue reading ->

How to query a REST API in Python

Al Brown • Last updated: Jun 15, 2026

Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.

Continue reading ->

How to convert Parquet to ORC

Al Brown • Last updated: Jun 6, 2026

Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.

Continue reading ->