To run SQL on a CSV file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then query the file directly:
clickhouse local -q "SELECT * FROM file('orders.csv') LIMIT 10"
2026-01-01 1 GB widget 5 1
2026-01-02 2 US gadget 6.01 2
2026-01-03 3 DE gizmo 7.02 3
2026-01-04 4 FR doohickey 8.03 4
2026-01-05 5 IN widget 9.04 5
file('orders.csv') reads the header row for column names and infers the types from the data, so the file is queried in place with no import step first.
Prefer Python? See How to read a CSV file in Python for the same queries against a pandas DataFrame.
Header and type detection
By default file('orders.csv') uses the CSVWithNames format: the first row is treated as column names, and each column's type is inferred from the values. Check what was detected with DESCRIBE:
clickhouse local -q "DESCRIBE file('orders.csv')"
order_date Nullable(Date)
order_id Nullable(Int64)
country Nullable(String)
product Nullable(String)
revenue Nullable(Float64)
quantity Nullable(Int64)
The header gave the names; the data gave the types. Columns are Nullable because inference allows for empty values.
You don't usually need to override this, but you can when inference guesses wrong, for example when a ZIP code or an order ID should stay a string rather than become an integer. Pass the format and an explicit schema as the second and third arguments to file():
clickhouse local -q "
SELECT * FROM file('orders.csv', 'CSVWithNames',
'order_date Date, order_id UInt32, country String, product String, revenue Float64, quantity UInt8')
ORDER BY revenue DESC LIMIT 3"
2026-01-20 20 US doohickey 24.19 5
2026-01-19 19 GB gizmo 23.18 4
2026-01-18 18 AU gadget 22.17 3
If your CSV has no header row, use the CSV format instead of CSVWithNames and supply the schema yourself (or refer to columns positionally as c1, c2, ...).
Because the file is a table, the full SQL surface works on it (WHERE, GROUP BY, JOIN, window functions) with no load step in between:
clickhouse local -q "
SELECT
country,
count() AS orders,
round(sum(revenue), 2) AS revenue,
round(avg(quantity), 2) AS avg_qty
FROM file('orders.csv')
GROUP BY country
ORDER BY revenue DESC"
US 4 60.4 3.5
GB 4 56.36 2.5
AU 3 48.33 2
IN 3 45.3 2.67
FR 3 42.27 3.33
DE 3 39.24 4
This is the everyday workflow: open a terminal, run one query against the file, read the answer. The exact same SQL runs unchanged against dozens of file formats and remote sources, and against a ClickHouse server or ClickHouse Cloud when the data outgrows your laptop, with no rewrite.
CSV is convenient but slow to scan repeatedly: every query re-parses text and re-infers types. If you'll query the data more than once, convert it to Parquet once and read that instead. It's a single command: SELECT from the CSV, INTO OUTFILE as Parquet.
clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders.parquet' TRUNCATE FORMAT Parquet"
The result is a real Parquet file with the columns and types ClickHouse inferred. Read it back with the same file() call, or query it directly. See how to query a Parquet file and the dedicated convert CSV to Parquet guide for the column-typing and compression options.
You don't need to unzip anything first. clickhouse-local detects the .gz extension and decompresses on the fly, so a .csv.gz file is queried exactly like a .csv:
clickhouse local -q "SELECT country, count() FROM file('orders.csv.gz') GROUP BY country ORDER BY country"
AU 3
DE 3
FR 3
GB 4
IN 3
US 4
The same applies to .csv.zst, .csv.xz, and other supported codecs. The compression is inferred from the file name.
Fast enough that the import step you skipped was the slow part. On an 8,000,000-row, ~338 MB CSV (orders_large.csv), a full GROUP BY country with sum and avg over every row completes in:
clickhouse local -q "
SELECT country, count() AS orders, round(sum(revenue), 2) AS revenue, round(avg(quantity), 3) AS avg_qty
FROM file('orders_large.csv')
GROUP BY country
ORDER BY revenue DESC"
about 0.53 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That includes parsing the CSV text from scratch on every run; there's no cached table. Convert the same data to Parquet and the scan gets faster again, because Parquet is columnar and typed.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~338 MB file used for the timing above), run.sh with every command on this page, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/clickhouse-local-csv