To query a REST API with SQL, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then pass the endpoint to the url() table function and query the response directly:
clickhouse local -q "SELECT * FROM url('https://api.example.com/events', JSONEachRow) LIMIT 5"
┌─id─┬──────────────────ts─┬─country─┬─event────┬─amount─┐
1. │ 1 │ 2026-06-01 00:00:00 │ GB │ click │ 5 │
2. │ 2 │ 2026-06-01 00:02:17 │ US │ view │ 6.01 │
3. │ 3 │ 2026-06-01 00:04:34 │ DE │ purchase │ 7.02 │
4. │ 4 │ 2026-06-01 00:06:51 │ FR │ refund │ 8.03 │
5. │ 5 │ 2026-06-01 00:09:08 │ IN │ click │ 9.04 │
└────┴─────────────────────┴─────────┴──────────┴────────┘
url() turns the HTTP response into a table you can filter, aggregate, and join in the same statement. The response is read in place with no download or import step: you name the format (JSONEachRow here) and ClickHouse infers the columns from the payload.
You don't write CREATE TABLE for an API response. ClickHouse reads the JSON and infers the columns. DESCRIBE shows you what it found before you write the real query:
clickhouse local -q "DESCRIBE url('https://api.example.com/events', JSONEachRow) FORMAT PrettyCompact"
┌─name────┬─type───────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ id │ Nullable(Int64) │ │ │ │ │ │
2. │ ts │ Nullable(DateTime) │ │ │ │ │ │
3. │ country │ Nullable(String) │ │ │ │ │ │
4. │ event │ Nullable(String) │ │ │ │ │ │
5. │ amount │ Nullable(Float64) │ │ │ │ │ │
└─────────┴────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Timestamps come back as DateTime, integers and floats keep sensible widths, and columns are Nullable because inference allows for missing keys. If a field guesses wrong, pass an explicit structure as the third argument: url('...', JSONEachRow, 'id UInt32, ts DateTime, country String, event String, amount Float64').
This is the point of querying the API with SQL rather than parsing it in code: the response is a table, so the full ClickHouse dialect works on it directly. No intermediate variables, no manual loops.
clickhouse local -q "
SELECT country, count() AS events, round(sum(amount), 2) AS total
FROM url('https://api.example.com/events', JSONEachRow)
GROUP BY country
ORDER BY total DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬───total─┐
1. │ NL │ 25 │ 2712.75 │
2. │ JP │ 25 │ 2687.5 │
3. │ BR │ 25 │ 2662.25 │
4. │ IN │ 25 │ 2637 │
5. │ FR │ 25 │ 2612.75 │
6. │ DE │ 25 │ 2587.5 │
7. │ US │ 25 │ 2562.25 │
8. │ GB │ 25 │ 2537 │
└─────────┴────────┴─────────┘
The whole thing is one round trip and one expression. Add a WHERE event = 'purchase', a window function, or a HAVING clause and nothing else changes.
Because both url() and file() produce tables, you can join the API response to data on your disk in the same query. Here a feed of events is enriched with a region lookup CSV that lives locally:
clickhouse local -q "
SELECT r.region AS region, count() AS events, round(sum(f.amount), 2) AS total
FROM url('https://api.example.com/events', JSONEachRow) AS f
JOIN file('regions.csv', CSVWithNames) AS r ON f.country = r.country
GROUP BY region
ORDER BY total DESC
FORMAT PrettyCompact"
┌─region─┬─events─┬──total─┐
1. │ EMEA │ 100 │ 10450 │
2. │ APAC │ 50 │ 5324.5 │
3. │ AMER │ 50 │ 5224.5 │
└────────┴────────┴────────┘
That join would normally mean fetching the API, deserializing it, loading the lookup, and merging in application code. Here it is one statement that reads two sources and returns the answer.
url() issues a plain GET. Many real APIs need an Authorization header or an API key. Pass headers with the headers() clause as the last argument (after the format and an explicit structure):
clickhouse local -q "
SELECT count()
FROM url('https://api.example.com/events', JSONEachRow,
'id UInt32, country String, amount Float64',
headers('Authorization'='Bearer YOUR_TOKEN'))"
For anything more involved than a header (paginated cursors, signed requests, POST payloads), fetch the body with curl to a file first, then query that file with file(). The SQL surface is identical either way; only the fetch differs.
Compressed and array responses
Two common API shapes are handled without extra work. If the endpoint returns a gzip-compressed body and a .json.gz path, ClickHouse decompresses it transparently from the extension. And if the API returns one top-level JSON array ([ {...}, {...} ]) rather than newline-delimited objects, JSONEachRow reads it anyway — it accepts both an array and line-delimited objects. For a body that is a single JSON object with a nested array under a key, see how to query nested JSON with SQL.
The work splits between the network fetch and the scan. On a 1,500,000-row JSON response (~133 MB) served over HTTP from localhost, the full GROUP BY country with sum over every row runs in:
clickhouse local -q "
SELECT country, count() AS events, round(sum(amount), 2) AS total
FROM url('http://127.0.0.1:8731/feed_large.json', JSONEachRow)
GROUP BY country ORDER BY total DESC"
~0.35 seconds, best of three with the response warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That includes parsing the JSON from scratch on every run. Over a real network the wall-clock time is dominated by the endpoint's own latency, not the parse.
The query you just ran against a localhost endpoint is the same SQL you would run against a public API, against a CSV on disk, or against a table in a ClickHouse server or ClickHouse Cloud. You swap url(...) for file(...), an s3() call, or a table name, and the SELECT ... GROUP BY stays put. Prototype against an API on your laptop, ship the identical logic to production.
The complete, runnable example lives here. generate.sh builds the sample JSON feed and the ~133 MB perf file; run.sh starts a throwaway local HTTP server, runs every command above against it, and shuts it down; expected_output.txt is the captured proof:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-rest-api
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-rest-api
./generate.sh && ./run.sh