How to query a REST API with SQL

Al Brown
Last updated: Jun 15, 2026

To query a REST API with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then pass the endpoint to the url() table function and query the response directly:

1clickhouse local -q "SELECT * FROM url('https://api.example.com/events', JSONEachRow) LIMIT 5"
1   ┌─id─┬──────────────────ts─┬─country─┬─event────┬─amount─┐
21.12026-06-01 00:00:00 │ GB      │ click    │      532.22026-06-01 00:02:17 │ US      │ view6.0143.32026-06-01 00:04:34 │ DE      │ purchase │   7.0254.42026-06-01 00:06:51 │ FR      │ refund   │   8.0365.52026-06-01 00:09:08IN      │ click    │   9.047   └────┴─────────────────────┴─────────┴──────────┴────────┘

url() turns the HTTP response into a table you can filter, aggregate, and join in the same statement. The response is read in place with no download or import step: you name the format (JSONEachRow here) and ClickHouse infers the columns from the payload.

See the response schema without declaring one #

You don't write CREATE TABLE for an API response. ClickHouse reads the JSON and infers the columns. DESCRIBE shows you what it found before you write the real query:

1clickhouse local -q "DESCRIBE url('https://api.example.com/events', JSONEachRow) FORMAT PrettyCompact"
1   ┌─name────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
21. │ id      │ Nullable(Int64)    │              │                    │         │                  │                │
32. │ ts      │ Nullable(DateTime) │              │                    │         │                  │                │
43. │ country │ Nullable(String)   │              │                    │         │                  │                │
54. │ event   │ Nullable(String)   │              │                    │         │                  │                │
65. │ amount  │ Nullable(Float64)  │              │                    │         │                  │                │
7   └─────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Timestamps come back as DateTime, integers and floats keep sensible widths, and columns are Nullable because inference allows for missing keys. If a field guesses wrong, pass an explicit structure as the third argument: url('...', JSONEachRow, 'id UInt32, ts DateTime, country String, event String, amount Float64').

Aggregate the response in one statement #

This is the point of querying the API with SQL rather than parsing it in code: the response is a table, so the full ClickHouse dialect works on it directly. No intermediate variables, no manual loops.

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount), 2) AS total
3FROM url('https://api.example.com/events', JSONEachRow)
4GROUP BY country
5ORDER BY total DESC
6FORMAT PrettyCompact"
1   ┌─country─┬─events─┬───total─┐
21. │ NL      │     25 │ 2712.75 │
32. │ JP      │     25 │  2687.5 │
43. │ BR      │     25 │ 2662.25 │
54. │ IN      │     25 │    2637 │
65. │ FR      │     25 │ 2612.75 │
76. │ DE      │     25 │  2587.5 │
87. │ US      │     25 │ 2562.25 │
98. │ GB      │     25 │    2537 │
10   └─────────┴────────┴─────────┘

The whole thing is one round trip and one expression. Add a WHERE event = 'purchase', a window function, or a HAVING clause and nothing else changes.

Join a live API response to a local file #

Because both url() and file() produce tables, you can join the API response to data on your disk in the same query. Here a feed of events is enriched with a region lookup CSV that lives locally:

1clickhouse local -q "
2SELECT r.region AS region, count() AS events, round(sum(f.amount), 2) AS total
3FROM url('https://api.example.com/events', JSONEachRow) AS f
4JOIN file('regions.csv', CSVWithNames) AS r ON f.country = r.country
5GROUP BY region
6ORDER BY total DESC
7FORMAT PrettyCompact"
1   ┌─region─┬─events─┬──total─┐
21. │ EMEA   │    100 │  10450 │
32. │ APAC   │     50 │ 5324.5 │
43. │ AMER   │     50 │ 5224.5 │
5   └────────┴────────┴────────┘

That join would normally mean fetching the API, deserializing it, loading the lookup, and merging in application code. Here it is one statement that reads two sources and returns the answer.

Endpoints that need an auth header #

url() issues a plain GET. Many real APIs need an Authorization header or an API key. Pass headers with the headers() clause as the last argument (after the format and an explicit structure):

1clickhouse local -q "
2SELECT count()
3FROM url('https://api.example.com/events', JSONEachRow,
4         'id UInt32, country String, amount Float64',
5         headers('Authorization'='Bearer YOUR_TOKEN'))"

For anything more involved than a header (paginated cursors, signed requests, POST payloads), fetch the body with curl to a file first, then query that file with file(). The SQL surface is identical either way; only the fetch differs.

Compressed and array responses #

Two common API shapes are handled without extra work. If the endpoint returns a gzip-compressed body and a .json.gz path, ClickHouse decompresses it transparently from the extension. And if the API returns one top-level JSON array ([ {...}, {...} ]) rather than newline-delimited objects, JSONEachRow reads it anyway — it accepts both an array and line-delimited objects. For a body that is a single JSON object with a nested array under a key, see how to query nested JSON with SQL.

How fast is it? #

The work splits between the network fetch and the scan. On a 1,500,000-row JSON response (~133 MB) served over HTTP from localhost, the full GROUP BY country with sum over every row runs in:

1clickhouse local -q "
2SELECT country, count() AS events, round(sum(amount), 2) AS total
3FROM url('http://127.0.0.1:8731/feed_large.json', JSONEachRow)
4GROUP BY country ORDER BY total DESC"

~0.35 seconds, best of three with the response warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON from scratch on every run. Over a real network the wall-clock time is dominated by the endpoint's own latency, not the parse.

The same SQL scales unchanged #

The query you just ran against a localhost endpoint is the same SQL you would run against a public API, against a CSV on disk, or against a table in a ClickHouse server or ClickHouse Cloud. You swap url(...) for file(...), an s3() call, or a table name, and the SELECT ... GROUP BY stays put. Prototype against an API on your laptop, ship the identical logic to production.

Run it yourself #

The complete, runnable example lives here. generate.sh builds the sample JSON feed and the ~133 MB perf file; run.sh starts a throwaway local HTTP server, runs every command above against it, and shuts it down; expected_output.txt is the captured proof:

github.com/ClickHouse/examples → local-analytics/clickhouse-local-rest-api

1git clone https://github.com/ClickHouse/examples
2cd examples/local-analytics/clickhouse-local-rest-api
3./generate.sh && ./run.sh
Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...