To convert a CSV file to ORC, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the CSV and write the result into ORC:
clickhouse local -q "SELECT * FROM file('events.csv') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
ClickHouse reads the CSV header for column names, infers types from the data, and streams rows into a typed columnar ORC file. The source file is read in place with no import step, and because it streams, it handles files larger than RAM.
The result is a real columnar ORC file with the column names from the header and the types ClickHouse inferred from the data. Check both ends with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.csv')" # source
clickhouse local -q "DESCRIBE file('events.orc')" # result
-- CSV (source):
event_date Nullable(Date)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
quantity Nullable(Int64)
-- ORC (result):
event_date Nullable(Date32)
event_id Nullable(Int64)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
quantity Nullable(Int64)
Read the ORC straight back, or aggregate on it without a load step:
clickhouse local -q "SELECT * FROM file('events.orc') LIMIT 5"
2026-01-01 1 GB click 5 1
2026-01-02 2 US view 6.01 2
2026-01-03 3 DE signup 7.02 3
2026-01-04 4 FR purchase 8.03 4
2026-01-05 5 IN click 9.04 5
This is the information you don't get from a drag-and-drop converter site: the type mapping is explicit and it round-trips. ORC is a typed columnar format, so the inferred CSV types are stored as real ORC types rather than as text. Two mappings are worth knowing up front, because they hold even when you pin the types yourself:
- ORC has no 32-bit
Date type, so a ClickHouse Date is written as Date32. The values are identical; only the on-disk width differs.
- ORC integers are signed. An unsigned ClickHouse type such as
UInt32 is stored as the matching signed Int32.
You can see this by pinning the schema on the way in. Even when the source columns are declared Date, UInt32, and UInt8, the ORC result reports Date32, Int32, and Int8:
clickhouse local -q "
SELECT * FROM file('events.csv', 'CSVWithNames',
'event_date Date, event_id UInt32, country String, action String, amount Float64, quantity UInt8')
INTO OUTFILE 'events_typed.orc' TRUNCATE FORMAT ORC"
clickhouse local -q "DESCRIBE file('events_typed.orc')"
event_date Nullable(Date32)
event_id Nullable(Int32)
country Nullable(String)
action Nullable(String)
amount Nullable(Float64)
quantity Nullable(Int8)
Pinning the schema is still worth doing when type inference guesses wrong on the CSV — for example when an ID or a ZIP code should stay a string rather than become an integer. Pass the format and an explicit structure as the second and third arguments to file().
ORC compresses each column block on write. ClickHouse defaults to LZ4; you can switch the codec with one setting:
clickhouse local -q "
SELECT * FROM file('events_large.csv')
INTO OUTFILE 'events_large.orc' TRUNCATE FORMAT ORC
SETTINGS output_format_orc_compression_method='zstd'"
On the 3,000,000-row sample CSV, the codec choice maps to file size like this:
events_large.csv 123.28 MiB
events_large_none.orc 54.54 MiB
events_large.orc (lz4 default) 19.50 MiB
events_large_zstd.orc 19.50 MiB
Even uncompressed, ORC is less than half the size of the CSV because it stores typed columns instead of text. With LZ4 or zstd it lands near 20 MiB, roughly a 6x reduction over the source CSV, and zstd will usually edge ahead of LZ4 on larger, less repetitive data at a small write-time cost. none is useful when another layer already compresses the bytes. Other useful knobs: output_format_orc_row_index_stride controls the row-group stride that lets readers skip blocks, and output_format_orc_string_as_string keeps ORC string columns as string rather than binary.
If you work in a notebook, chDB is ClickHouse as an in-process Python module. The conversion is the same SQL, no server and no separate binary:
import chdb
chdb.query("SELECT * FROM file('events.csv') INTO OUTFILE 'events.orc' TRUNCATE FORMAT ORC")
print(chdb.query("SELECT count() FROM file('events.orc')", "CSV"), end="")
The INTO OUTFILE ... FORMAT ORC clause writes the file directly from the engine, so you skip the pandas-then-to_orc round trip and the conversion handles inputs larger than memory. To read ORC back into a DataFrame, see how to read an ORC file in Python.
Converting events_large.csv (3,000,000 rows, ~123 MiB) to ORC, including parsing the CSV text from scratch every run:
clickhouse local -q "SELECT * FROM file('events_large.csv') INTO OUTFILE 'events_large.orc' TRUNCATE FORMAT ORC"
~0.50 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The number can wobble slightly under concurrent load. That is a one-time cost: you convert once, then every later query reads the typed columnar ORC instead of re-parsing CSV text.
CSV is convenient to produce but slow to scan repeatedly, because every query re-parses the text and re-infers the types. ORC stores the data column by column, already typed and compressed, so analytical scans read less and skip what they don't need. Parquet is the other common columnar choice and is often a better default for cross-tool interchange; see convert CSV to Parquet and Parquet vs CSV. Reach for ORC when your stack expects it, for example a Hive or Spark pipeline standardized on ORC.
Going the other way? See how to convert ORC to CSV.
The same SQL that converts a file on your laptop runs unchanged against a ClickHouse server and ClickHouse Cloud when the data outgrows one machine — no rewrite.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~123 MiB file used for the timing above), run.sh with every command on this page, a chDB run.py / run.ipynb, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-orc
Related: run SQL on a CSV file, read an ORC file with SQL, and what is the ORC file format.