How to read a file with a custom delimiter

Al Brown
Last updated: Jun 8, 2026

To read a file with a custom delimiter, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then use the CustomSeparated format and tell it the separator:

1clickhouse local -q "
2SELECT * FROM file('orders.txt', CustomSeparatedWithNames) LIMIT 5
3SETTINGS format_custom_field_delimiter='|~|', format_custom_escaping_rule='CSV'"
11	GB	widget	5	1
22	US	gadget	6.01	2
33	DE	gizmo	7.02	3
44	FR	doohickey	8.03	4
55	IN	widget	9.04	5

The file above separates fields with |~|, a multi-character sequence that no CSV or TSV reader would split on correctly. CustomSeparated handles it because you supply the delimiter yourself, and the file is read in place with no import step first.

The two settings that define the format #

CustomSeparated is configured by settings, not by a fixed dialect. The two you reach for most:

  • format_custom_field_delimiter sets what separates the columns. Default is \t. Set it to |~|, ::, ;, or any string.
  • format_custom_escaping_rule sets how a field's text is interpreted. CSV is the sane default: it honours quoting, so a value that itself contains the delimiter still parses when wrapped in quotes.

If the field delimiter is a single tab, you have a TSV, so use TSVWithNames. If it's a single comma, that's a CSV. CustomSeparated is for everything else.

Use CustomSeparatedWithNames when the first line holds column names. With names present, ClickHouse infers the column types from the data, so you don't declare a schema.

See the schema with DESCRIBE #

DESCRIBE prints the column names (from the header) and the types ClickHouse inferred (from the data), with no CREATE TABLE:

1clickhouse local -q "
2DESCRIBE file('orders.txt', CustomSeparatedWithNames)
3SETTINGS format_custom_field_delimiter='|~|', format_custom_escaping_rule='CSV'"
1order_id	Nullable(Int64)
2country	Nullable(String)
3product	Nullable(String)
4revenue	Nullable(Float64)
5quantity	Nullable(Int64)

The header gave the names; the values gave the types. Columns come back Nullable because inference allows for empty fields. If inference guesses wrong, say an order ID that should stay a string, pass the format and an explicit schema as the second and third arguments to file(), exactly as you would for a CSV.

Filter and group by, directly on the file #

Once the delimiter is set, the file is just a SQL source. The full ClickHouse dialect applies (WHERE, GROUP BY, aggregates, joins, window functions) with no load step in between:

1clickhouse local -q "
2SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue
3FROM file('orders.txt', CustomSeparatedWithNames)
4GROUP BY country
5ORDER BY revenue DESC
6SETTINGS format_custom_field_delimiter='|~|', format_custom_escaping_rule='CSV'"
1AU	2	26.16
2IN	2	24.14
3FR	2	22.12
4DE	2	20.1
5US	2	18.08
6GB	2	16.06

Custom row delimiters too #

Some exports don't end rows with a bare newline. They append a marker such as ;; or a record separator byte. CustomSeparated covers that with format_custom_row_after_delimiter. Here a file uses :: between fields and ;;\n after each row, with no header, so the schema is given explicitly:

1clickhouse local -q "
2SELECT * FROM file('orders_pipe.txt', CustomSeparated,
3  'order_id UInt32, country String, revenue Float64')
4LIMIT 5
5SETTINGS format_custom_field_delimiter=' :: ',
6         format_custom_row_after_delimiter=' ;;\n',
7         format_custom_escaping_rule='CSV'"
11	GB	5
22	US	6.01
33	DE	7.02
44	FR	8.03
55	IN	9.04

There are matching settings for the rest of the layout when you need them: format_custom_row_before_delimiter, format_custom_row_between_delimiter, and format_custom_result_before_delimiter / format_custom_result_after_delimiter for a fixed header or footer wrapping the whole result. The common pipe and semicolon cases have shorthand guides: pipe-delimited and semicolon-separated files.

Compressed files work too #

You don't unzip anything first. clickhouse local detects the .gz extension and decompresses on the fly, so a custom-delimited .txt.gz is queried exactly like the plain file, with the same delimiter settings:

1clickhouse local -q "
2SELECT count() FROM file('orders.txt.gz', CustomSeparatedWithNames)
3SETTINGS format_custom_field_delimiter='|~|', format_custom_escaping_rule='CSV'"
112

The same applies to .zst, .xz, and other supported codecs. The compression is inferred from the file name.

How fast is it? #

The delimiter being unusual costs nothing at scale. On a 3,000,000-row, ~112 MB file (orders_large.txt, |~| delimited), a full GROUP BY country with sum over every row completes in:

1clickhouse local -q "
2SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue
3FROM file('orders_large.txt', CustomSeparatedWithNames)
4GROUP BY country ORDER BY revenue DESC
5SETTINGS format_custom_field_delimiter='|~|', format_custom_escaping_rule='CSV'"

~0.55 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That parses the delimited text from scratch on every run across all cores; there is no cached table. A column-aware split on a multi-character separator, done with a hand-written script, would be slower and more error-prone on quoted fields.

1US	300294	75094223.6
2FR	300166	75079479.91
3JP	300169	75064502.48
4CA	300164	75009787.27
5GB	300110	75002592.83

One binary reads this custom format alongside CSV, JSON, Parquet, ORC and many others, and the same SQL runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows your machine. You prototype against the file and ship the identical query.

Run it yourself #

The complete, runnable example lives here. It has generate.sh (creates the |~| file, a gzipped copy, a custom field-and-row-delimiter file, and the ~112 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-custom-delimiter

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-custom-delimiter
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...