How to read a pipe-delimited file

Al Brown
Last updated: Jun 8, 2026

To read a pipe-delimited file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file with CustomSeparatedWithNames and set the delimiter to a pipe:

1clickhouse local -q "
2SELECT * FROM file('orders.psv', 'CustomSeparatedWithNames') LIMIT 10
3SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"
12026-01-01	1	GB	widget	5	1
22026-01-02	2	US	gadget	6.01	2
32026-01-03	3	DE	gizmo	7.02	3
42026-01-04	4	FR	doohickey	8.03	4
52026-01-05	5	IN	widget	9.04	5

Two settings tell it the file is pipe-delimited: format_custom_field_delimiter='|' splits fields on the pipe, and format_custom_escaping_rule='CSV' unwraps any quoted cells. The file is read in place with no import step; the header row supplies the column names and the data supplies the types.

See the schema without declaring one #

CustomSeparatedWithNames reads the first row as column names; ClickHouse infers each type from the data. Check what it found with DESCRIBE:

1clickhouse local -q "
2DESCRIBE file('orders.psv', 'CustomSeparatedWithNames')
3SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"
1order_date	Nullable(Date)
2order_id	Nullable(Int64)
3country	Nullable(String)
4product	Nullable(String)
5revenue	Nullable(Float64)
6quantity	Nullable(Int64)

The header gave the names; the data gave the types. Columns come back Nullable because inference allows for empty values. No CREATE TABLE required.

The escaping rule is the gotcha #

A pipe-delimited file needs two settings, not one, and the second is easy to forget. format_custom_field_delimiter='|' splits fields on the pipe. format_custom_escaping_rule='CSV' tells ClickHouse how a single field is encoded: CSV means a field may be wrapped in double quotes that should be stripped, exactly like a normal CSV cell.

Drop the escaping rule and the default (Escaped, which uses backslash escaping) takes over. It does not unwrap the quotes around each field, and the inference collapses the whole line into a single String column named after the quoted header:

1clickhouse local -q "
2DESCRIBE file('orders.psv', 'CustomSeparatedWithNames')
3SETTINGS format_custom_field_delimiter='|'"
1"order_date"|"order_id"|"country"|"product"|"revenue"|"quantity"	Nullable(String)

That one column is the symptom to recognize. If a pipe-delimited read returns a single fat String column, or every value carries literal " characters, set format_custom_escaping_rule='CSV' and the columns split cleanly. If your file has no quoting at all, Escaped and CSV behave the same on plain values; CSV is the safe default because it handles both.

Aggregate directly on the file #

Once the columns are typed, the full SQL surface works on the file: WHERE, GROUP BY, aggregates, joins, window functions, with no load step in between.

1clickhouse local -q "
2SELECT
3  country,
4  count()                 AS orders,
5  round(sum(revenue), 2)  AS revenue,
6  round(avg(quantity), 2) AS avg_qty
7FROM file('orders.psv', 'CustomSeparatedWithNames')
8GROUP BY country
9ORDER BY revenue DESC
10SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"
1US	4	60.4	3.5
2GB	4	56.36	2.5
3AU	3	48.33	2
4IN	3	45.3	2.67
5FR	3	42.27	3.33
6DE	3	39.24	4

Typing the two SETTINGS on every query gets old. Set them once with SET and query normally for the rest of the session:

1clickhouse local -q "
2SET format_custom_field_delimiter='|', format_custom_escaping_rule='CSV';
3SELECT count() FROM file('orders.psv', 'CustomSeparatedWithNames');"
120

The same CustomSeparated settings handle any single-character delimiter, the same SQL runs unchanged across dozens of other formats and remote sources, and it scales to a ClickHouse server or ClickHouse Cloud when the data outgrows your machine.

No header row #

If the file has no header, use CustomSeparated (without WithNames) and supply the schema yourself. The same two delimiter settings still apply:

1clickhouse local -q "
2SELECT * FROM file('orders_nohdr.psv', 'CustomSeparated',
3  'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
4ORDER BY revenue DESC LIMIT 3
5SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"
12026-01-20	20	US	doohickey	24.19	5
22026-01-19	19	GB	gizmo	23.18	4
32026-01-18	18	AU	gadget	22.17	3

Gzipped pipe-delimited files work too #

You don't unzip anything first. clickhouse local detects the .gz suffix and decompresses on the fly, so a .psv.gz is queried exactly like a .psv:

1clickhouse local -q "
2SELECT country, count() FROM file('orders.psv.gz', 'CustomSeparatedWithNames')
3GROUP BY country ORDER BY country
4SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"
1AU	3
2DE	3
3FR	3
4GB	4
5IN	3
6US	4

The same applies to .psv.zst, .psv.xz, and other supported codecs. The compression is inferred from the file name. See querying compressed files for the full list.

How fast is it? #

The text parsing you avoided importing was the slow part. On a 3,000,000-row, ~126 MB pipe-delimited file (orders_large.psv), a full GROUP BY country with sum and avg over every row runs in:

1clickhouse local -q "
2SELECT country, count() AS orders, round(sum(revenue),2) AS revenue, round(avg(quantity),3) AS avg_qty
3FROM file('orders_large.psv', 'CustomSeparatedWithNames')
4GROUP BY country ORDER BY revenue DESC
5SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"

~0.60 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the delimited text from scratch on every run; there is no cached table. Query the same data repeatedly and you should convert it once to a typed columnar format.

Convert it once for repeat queries #

Pipe-delimited text is slow to scan again and again because every query re-parses it. If you'll query the data more than once, write it to Parquet once and read that instead:

1clickhouse local -q "
2SELECT * FROM file('orders.psv', 'CustomSeparatedWithNames')
3INTO OUTFILE 'orders.parquet' TRUNCATE FORMAT Parquet
4SETTINGS format_custom_field_delimiter='|', format_custom_escaping_rule='CSV'"

The result is a real Parquet file carrying the columns and types ClickHouse inferred. See how to query a Parquet file and convert CSV to Parquet for the typing and compression options (the workflow is identical from a pipe-delimited source).

The same SQL scales unchanged #

The query you ran on a local .psv is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. You swap file('orders.psv', 'CustomSeparatedWithNames') for a table name and the SELECT ... GROUP BY stays put. You prototype against a file on your machine and ship the identical logic to production.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the demo .psv, a headerless copy, and the ~126 MB perf file), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-pipe-delimited

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-pipe-delimited
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...