To read a semicolon-separated file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then set the field delimiter to ; and query:
clickhouse local -q "
SELECT * FROM file('orders.csv', CustomSeparatedWithNames)
LIMIT 10
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─order_date─┬─order_id─┬─country─┬─product───┬─revenue─┬─quantity─┐
1. │ 2026-01-01 │ 1 │ GB │ widget │ 5 │ 1 │
2. │ 2026-01-02 │ 2 │ US │ gadget │ 6.01 │ 2 │
3. │ 2026-01-03 │ 3 │ DE │ gizmo │ 7.02 │ 3 │
4. │ 2026-01-04 │ 4 │ FR │ doohickey │ 8.03 │ 4 │
5. │ 2026-01-05 │ 5 │ IN │ widget │ 9.04 │ 5 │
└────────────┴──────────┴─────────┴───────────┴─────────┴──────────┘
CustomSeparatedWithNames reads the header row for column names and infers types from the data; the ; delimiter is just a setting. The file is read in place with no import step first. Semicolon files are the common European export from spreadsheets and ERP systems, where the comma is already the decimal separator. That detail has a sting in the tail; see the gotcha section below.
CustomSeparatedWithNames takes the column names from the header and infers each type from the data. DESCRIBE shows what it found:
clickhouse local -q "
DESCRIBE file('orders.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─name───────┬─type──────────────┐
1. │ order_date │ Nullable(Date) │
2. │ order_id │ Nullable(Int64) │
3. │ country │ Nullable(String) │
4. │ product │ Nullable(String) │
5. │ revenue │ Nullable(Float64) │
6. │ quantity │ Nullable(Int64) │
└────────────┴───────────────────┘
The header gave the names; the data gave the types. The format_custom_escaping_rule = 'CSV' setting means values follow CSV quoting rules, so a field that itself contains a ; can be wrapped in quotes and still parse. If your file has no header, use the CustomSeparated format instead of CustomSeparatedWithNames and pass an explicit schema as the third argument to file().
Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:
clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue, round(avg(quantity), 2) AS avg_qty
FROM file('orders.csv', CustomSeparatedWithNames)
GROUP BY country ORDER BY revenue DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'
FORMAT PrettyCompact"
┌─country─┬─orders─┬─revenue─┬─avg_qty─┐
1. │ US │ 4 │ 60.4 │ 3.5 │
2. │ GB │ 4 │ 56.36 │ 2.5 │
3. │ AU │ 3 │ 48.33 │ 2 │
4. │ IN │ 3 │ 45.3 │ 2.67 │
5. │ FR │ 3 │ 42.27 │ 3.33 │
6. │ DE │ 3 │ 39.24 │ 4 │
└─────────┴────────┴─────────┴─────────┘
This is the catch with European semicolon files, and it is exactly why the delimiter is ; rather than , in the first place: the comma is the decimal separator. A price written 1234,50 means one thousand two hundred thirty-four and a half. ClickHouse infers types per the C locale, where the decimal point is ., so it sees 1234,50 as text and types the whole column as String:
clickhouse local -q "
DESCRIBE file('orders_eu.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
┌─name───────┬─type─────────────┐
1. │ order_date │ Nullable(Date) │
2. │ order_id │ Nullable(Int64) │
3. │ country │ Nullable(String) │
4. │ product │ Nullable(String) │
5. │ revenue │ Nullable(String) │ <- decimal commas, read as text
6. │ quantity │ Nullable(Int64) │
└────────────┴──────────────────┘
The fix is one expression: read the column as text, swap the comma for a point, and cast to a number. replaceOne(revenue, ',', '.')::Float64 does it inline:
clickhouse local -q "
SELECT country, replaceOne(revenue, ',', '.')::Float64 AS revenue_eur
FROM file('orders_eu.csv', CustomSeparatedWithNames)
ORDER BY revenue_eur DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'
FORMAT PrettyCompact"
┌─country─┬─revenue_eur─┐
1. │ DE │ 1234.5 │
2. │ FR │ 1000 │
3. │ DE │ 89.99 │
4. │ NL │ 7.02 │
5. │ FR │ 6.01 │
└─────────┴─────────────┘
Now revenue_eur is a real Float64 and aggregates correctly:
clickhouse local -q "
SELECT round(sum(replaceOne(revenue, ',', '.')::Float64), 2) AS total_eur
FROM file('orders_eu.csv', CustomSeparatedWithNames)
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
If thousands are also grouped with a separator (for example 1.234,50), strip the grouping dot first: replaceOne(replaceAll(revenue, '.', ''), ',', '.')::Float64. Do the conversion once at read time and the rest of your SQL stays clean.
The same SQL runs unchanged across dozens of file formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.
You do not unzip anything first. clickhouse local detects a .gz, .zst, .xz or similar suffix from the file name and decompresses on the fly, so a orders.csv.gz semicolon file is read exactly like the plain one — same file() call, same ; settings.
The import step you skipped was the slow part. On a 3,000,000-row, ~110 MB semicolon file (orders_large.csv), a full GROUP BY country with sum and avg over every row runs in:
clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue),2) AS revenue, round(avg(quantity),3) AS avg_qty
FROM file('orders_large.csv', CustomSeparatedWithNames)
GROUP BY country ORDER BY revenue DESC
SETTINGS format_custom_field_delimiter = ';', format_custom_escaping_rule = 'CSV'"
~0.59 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the delimited text from scratch on every run; there is no cached table. Convert the same data to Parquet once and repeat scans get faster again, because Parquet is columnar and typed.
The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('orders.csv', ...) for a table name and the rest stays put. Prototype against a file on your machine, ship the identical logic to production.
The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the demo file, the European decimal-comma file, and the ~110 MB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-semicolon
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-semicolon
./generate.sh && ./run.sh