Parse url-encoded form data with SQL

Al Brown
Last updated: Jun 15, 2026

To parse url-encoded form data and query its fields with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the form body with the Form format:

1clickhouse local -q "SELECT * FROM file('payload.txt', Form)"
1Row 1:
2──────
3event:   signup
4user_id: 1001
5plan:    pro
6amount:  49.00
7country: GB

The Form format reads one application/x-www-form-urlencoded body and turns each key=value pair into a column, with the file read in place and no import step. One body becomes one queryable row.

Every field comes back as a String #

A form body carries no type information; it is all text after the =. So Form infers every column as Nullable(String). DESCRIBE confirms it:

1clickhouse local -q "DESCRIBE file('payload.txt', Form)"
1event	Nullable(String)
2user_id	Nullable(String)
3plan	Nullable(String)
4amount	Nullable(String)
5country	Nullable(String)

The keys became column names, which is what makes the format pleasant to work with. If you want to compute on a field (sum the amounts, filter by a numeric id), give it a real type by passing an explicit structure as the third argument to file():

1clickhouse local -q "
2SELECT event, user_id, plan, amount, country
3FROM file('payload.txt', Form, 'event String, user_id UInt32, plan String, amount Decimal(10,2), country String')
4FORMAT Vertical"
1Row 1:
2──────
3event:   signup
4user_id: 1001
5plan:    pro
6amount:  49
7country: GB

Now user_id is an integer and amount is a decimal, ready for arithmetic.

The gotcha: percent-encoding is decoded, but + is not #

This is the one thing to know about Form. Real form bodies escape special characters: a space becomes + or %20, an ampersand becomes %26, accented letters become percent sequences like %C3%A3. Form decodes the percent escapes for you, but it does not convert a literal + back into a space. Watch what comes out of a body that uses both:

1clickhouse local -q "SELECT * FROM file('encoded.txt', Form)"

The body was name=Jane+Doe&city=S%C3%A3o+Paulo&note=a%26b%3Dc&amount=19.99:

1Row 1:
2──────
3name:   Jane+Doe
4city:   São+Paulo
5note:   a&b=c
6amount: 19.99

%C3%A3 decoded to ã, %26 to & and %3D to = (so the & and = inside note did not split into new fields). But Jane+Doe kept its +. If your source encodes spaces as +, fix it in the query with replaceAll:

1clickhouse local -q "
2SELECT replaceAll(name, '+', ' ') AS name, replaceAll(city, '+', ' ') AS city, note, amount
3FROM file('encoded.txt', Form)"
1Row 1:
2──────
3name:   Jane Doe
4city:   São Paulo
5note:   a&b=c
6amount: 19.99

One row, cleaned, no scripting.

One row per file, so glob a folder of them #

Because a Form file is exactly one row, a directory of captured webhook bodies is a table waiting to happen. Point file() at a glob and ClickHouse reads each file as a row. The virtual _file column tells you which body each row came from:

1clickhouse local -q "
2SELECT _file AS src, event, user_id, plan, amount
3FROM file('hooks/*.txt', Form, 'event String, user_id UInt32, plan String, amount Decimal(10,2)')
4ORDER BY user_id
5FORMAT PrettyCompact"
1   ┌─src───────┬─event───┬─user_id─┬─plan─┬─amount─┐
21. │ hook1.txt │ signup  │    1001 │ pro  │     4932. │ hook2.txt │ upgrade │    1002 │ team │     9943. │ hook3.txt │ signup  │    1003free05   └───────────┴─────────┴─────────┴──────┴────────┘

From there the full SQL surface applies (WHERE, GROUP BY, aggregates, joins) across the whole folder at once:

1clickhouse local -q "
2SELECT event, count() AS n, sum(amount) AS revenue
3FROM file('hooks/*.txt', Form, 'event String, user_id UInt32, plan String, amount Decimal(10,2)')
4GROUP BY event
5ORDER BY revenue DESC
6FORMAT PrettyCompact"
1   ┌─event───┬─n─┬─revenue─┐
21. │ upgrade │ 1 │      99 │
32. │ signup  │ 2 │      49 │
4   └─────────┴───┴─────────┘

That turns a heap of raw POST bodies on disk into an answer. For more on the glob pattern across formats, see run SQL across multiple files.

How fast is it? #

The cost here is opening many small files, not crunching numbers. On a folder of 2,000 one-body form files, parsing every file and grouping by plan runs in:

1clickhouse local -q "
2SELECT plan, count() AS n, sum(amount) AS revenue
3FROM file('perf/*.txt', Form, 'event String, user_id UInt32, plan String, amount Decimal(10,2), country String')
4GROUP BY plan ORDER BY revenue DESC"

~0.18 seconds, best of three with the files warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes opening all 2,000 files from scratch on every run — there is no cached table.

The same SQL scales unchanged #

The query you ran against a folder of form bodies is ordinary ClickHouse SQL. When those webhook captures grow past what a laptop folder can hold, you swap the file() glob for a table (or an s3() glob over a bucket) and the SELECT ... GROUP BY stays exactly the same. You prototype against files on your machine and ship the identical logic to a ClickHouse server or to ClickHouse Cloud.

Run it yourself #

The complete, runnable example lives here. It has generate.sh (creates the sample bodies and the 2,000-file perf set), run.sh (every command above), and expected_output.txt:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-form

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-form
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...