How to read a semicolon-separated file

Al Brown
Last updated: Jun 8, 2026

To read a semicolon-separated file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

curl https://clickhouse.com/cli | sh   # install clickhousectl
clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then set the field delimiter to ; and query:

clickhouse local -q "
SELECT * FROM file('orders.csv', CustomSeparatedWithNames)
LIMIT 10
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─order_date─┬─order_id─┬─country─┬─product───┬─revenue─┬─quantity─┐
 1. │ 2026-01-01 │        1 │ GB      │ widget    │       5 │        1 │
 2. │ 2026-01-02 │        2 │ US      │ gadget    │    6.01 │        2 │
 3. │ 2026-01-03 │        3 │ DE      │ gizmo     │    7.02 │        3 │
 4. │ 2026-01-04 │        4 │ FR      │ doohickey │    8.03 │        4 │
 5. │ 2026-01-05 │        5 │ IN      │ widget    │    9.04 │        5 │
    └────────────┴──────────┴─────────┴───────────┴─────────┴──────────┘

CustomSeparatedWithNames reads the header row for column names and infers types from the data; the ; delimiter is just a setting. The file is read in place with no import step first. Semicolon files are the common European export from spreadsheets and ERP systems, where the comma is already the decimal separator. That detail has a sting in the tail; see the gotcha section below.

See the schema without declaring one

CustomSeparatedWithNames takes the column names from the header and infers each type from the data. DESCRIBE shows what it found:

clickhouse local -q "
DESCRIBE file('orders.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─name───────┬─type──────────────┐
1. │ order_date │ Nullable(Date)    │
2. │ order_id   │ Nullable(Int64)   │
3. │ country    │ Nullable(String)  │
4. │ product    │ Nullable(String)  │
5. │ revenue    │ Nullable(Float64) │
6. │ quantity   │ Nullable(Int64)   │
   └────────────┴───────────────────┘

The header gave the names; the data gave the types. The format_custom_escaping_rule = 'CSV' setting means values follow CSV quoting rules, so a field that itself contains a ; can be wrapped in quotes and still parse. If your file has no header, use the CustomSeparated format instead of CustomSeparatedWithNames and pass an explicit schema as the third argument to file().

Aggregate directly on the file

Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:

clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue, round(avg(quantity), 2) AS avg_qty
FROM file('orders.csv', CustomSeparatedWithNames)
GROUP BY country ORDER BY revenue DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'
FORMAT PrettyCompact"
┌─country─┬─orders─┬─revenue─┬─avg_qty─┐
1. │ US      │      4 │    60.4 │     3.5 │
2. │ GB      │      4 │   56.36 │     2.5 │
3. │ AU      │      3 │   48.33 │       2 │
4. │ IN      │      3 │    45.3 │    2.67 │
5. │ FR      │      3 │   42.27 │    3.33 │
6. │ DE      │      3 │   39.24 │       4 │
   └─────────┴────────┴─────────┴─────────┘

The decimal-comma gotcha

This is the catch with European semicolon files, and it is exactly why the delimiter is ; rather than , in the first place: the comma is the decimal separator. A price written 1234,50 means one thousand two hundred thirty-four and a half. ClickHouse infers types per the C locale, where the decimal point is ., so it sees 1234,50 as text and types the whole column as String:

clickhouse local -q "
DESCRIBE file('orders_eu.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─name───────┬─type─────────────┐
1. │ order_date │ Nullable(Date)   │
2. │ order_id   │ Nullable(Int64)  │
3. │ country    │ Nullable(String) │
4. │ product    │ Nullable(String) │
5. │ revenue    │ Nullable(String) │   <- decimal commas, read as text
6. │ quantity   │ Nullable(Int64)  │
   └────────────┴──────────────────┘

The fix is one expression: read the column as text, swap the comma for a point, and cast to a number. replaceOne(revenue, ',', '.')::Float64 does it inline:

clickhouse local -q "
SELECT country, replaceOne(revenue, ',', '.')::Float64 AS revenue_eur
FROM file('orders_eu.csv', CustomSeparatedWithNames)
ORDER BY revenue_eur DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'
FORMAT PrettyCompact"
┌─country─┬─revenue_eur─┐
1. │ DE      │      1234.5 │
2. │ FR      │        1000 │
3. │ DE      │       89.99 │
4. │ NL      │        7.02 │
5. │ FR      │        6.01 │
   └─────────┴─────────────┘

Now revenue_eur is a real Float64 and aggregates correctly:

clickhouse local -q "
SELECT round(sum(replaceOne(revenue, ',', '.')::Float64), 2) AS total_eur
FROM file('orders_eu.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
2337.52

If thousands are also grouped with a separator (for example 1.234,50), strip the grouping dot first: replaceOne(replaceAll(revenue, '.', ''), ',', '.')::Float64. Do the conversion once at read time and the rest of your SQL stays clean.

The same SQL runs unchanged across dozens of file formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

Compressed files work too

You do not unzip anything first. clickhouse local detects a .gz, .zst, .xz or similar suffix from the file name and decompresses on the fly, so a orders.csv.gz semicolon file is read exactly like the plain one — same file() call, same ; settings.

How fast is it?

The import step you skipped was the slow part. On a 3,000,000-row, ~110 MB semicolon file (orders_large.csv), a full GROUP BY country with sum and avg over every row runs in:

clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue),2) AS revenue, round(avg(quantity),3) AS avg_qty
FROM file('orders_large.csv', CustomSeparatedWithNames)
GROUP BY country ORDER BY revenue DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"

~0.59 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the delimited text from scratch on every run; there is no cached table. Convert the same data to Parquet once and repeat scans get faster again, because Parquet is columnar and typed.

The same SQL scales unchanged

The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('orders.csv', ...) for a table name and the rest stays put. Prototype against a file on your machine, ship the identical logic to production.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the demo file, the European decimal-comma file, and the ~110 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-semicolon

git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-semicolon
./generate.sh && ./run.sh

Share this resource

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

More like this

Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

Continue reading ->

How to query a REST API in Python

Al Brown • Last updated: Jun 15, 2026

Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.

Continue reading ->

How to convert Parquet to ORC

Al Brown • Last updated: Jun 6, 2026

Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.

Continue reading ->