clickhouse-local is a small, fast, standalone build of ClickHouse that runs from a single binary, letting you run full ClickHouse SQL over local and remote files (and external databases) without installing or starting a server. You point it at a file and query it in place:
1clickhouse local -q "SELECT event, count() FROM file('events.parquet') GROUP BY event"There's no import step, no table to create, and no server process. It starts, runs the query, prints the result, and exits.
1click 250
2login 250
3logout 250
4purchase 250Install and invoke #
clickhouse-local ships inside the single clickhouse binary. The quickest way to get it is with clickhousectl, the ClickHouse CLI for local and cloud:
1curl https://clickhouse.com/cli | sh # install clickhousectl
2clickhousectl local use latest # download ClickHouse and put it on your PATHThat puts a self-contained clickhouse binary on your PATH. You then invoke local mode as clickhouse local:
1clickhouse local -q "SELECT version()"126.6.1.117There is no separate clickhouse-local binary to install; local is a subcommand of clickhouse. See the canonical reference in the docs for every flag.
How it works #
clickhouse-local is the same engine as ClickHouse server and ClickHouse Cloud, packaged to run as a one-shot process instead of a long-lived service. That has three consequences worth internalising.
No server, no state. There is no daemon to start, no port to bind, no data directory to manage. Each invocation spins up the engine, runs your query against whatever you point it at, and tears down. This makes it ideal for scripts, CI jobs, and ad-hoc work where standing up a database would be overkill.
The same SQL everywhere. clickhouse-local speaks the full ClickHouse SQL dialect: the same functions, the same aggregate combinators, the same window functions. A query you write against a Parquet file on your laptop runs unchanged against a table on a ClickHouse server or in ClickHouse Cloud. You prototype locally and ship to production with no query rewrite.
Data stays in place via table functions. You don't load data into clickhouse-local; you reference it where it lives using a table function. file() reads a local file, s3() reads from object storage, url() reads over HTTP, and so on. The engine reads only what the query needs.
Reads 70+ formats, schema auto-inferred #
clickhouse-local reads more than 70 input formats, including Parquet, CSV, TSV, JSON and NDJSON, ORC, Avro, Arrow, MessagePack, npy, BSON, Protobuf, and raw log lines. For self-describing formats like Parquet, the schema is inferred automatically. DESCRIBE shows you what it found:
1clickhouse local -q "DESCRIBE file('events.parquet')"1id UInt64
2event String
3day Date32
4amount UInt8It read the Parquet metadata and mapped each column to a ClickHouse type, with no CREATE TABLE and no schema file. The same inference works for CSV, JSON, and the rest; for delimited text without a header it falls back to positional c1, c2, ... columns you can name in the query.
Compression and globs come for free #
Compression is handled transparently by file extension. A .gz, .zst, .lz4, or .bz2 file is decompressed on the fly, so you query it exactly like an uncompressed file:
1clickhouse local -q "SELECT count() FROM file('logs/app-2026-06-02.log.gz', LineAsString)"1300And a glob pattern reads many files as one logical table. Here we scan every .log file in a directory and count HTTP status codes (widen the pattern to app-* and it picks up the .gz too, decompressing as it goes):
1clickhouse local -q "
2SELECT splitByChar(' ', line)[5] AS status, count() AS n
3FROM file('logs/app-*.log', LineAsString)
4GROUP BY status ORDER BY n DESC"1200 300
2500 150
3404 150Query data where it lives: files, cloud, databases #
The real reach of clickhouse-local is the set of table functions it exposes. Each one points the engine at a different source, and you can mix them in a single query.
| Source | Table function | How-to |
|---|---|---|
| Local Parquet file | file('data.parquet') | Query a Parquet file |
| CSV file | file('data.csv') | Run SQL on a CSV file |
| TSV file | file('data.tsv') | Query a TSV file |
| Log files / raw lines | file('*.log', LineAsString) | Analyze log files with SQL |
| Amazon S3 | s3('https://...') | Query Parquet on S3 |
| HTTP / remote URL | url('https://...') | (below) |
| Google Cloud Storage | gcs('https://...') | — |
| Azure Blob Storage | azureBlobStorage(...) | — |
| PostgreSQL | postgresql(...) | — |
| MySQL | mysql(...) | — |
| MongoDB | mongodb(...) | — |
| SQLite | sqlite(...) | — |
| Apache Iceberg | iceberg(...) | — |
| Delta Lake | deltaLake(...) | — |
Because these are just table functions, you can JOIN across sources in one query. For example, enrich rows from a CSV with rows from a Parquet file:
1clickhouse local -q "
2SELECT s.user AS user, count() AS sales, sum(s.spend) AS spend
3FROM file('sales.csv') AS s
4JOIN file('events.parquet') AS e ON toUInt64(s.id) = e.id
5GROUP BY user ORDER BY spend DESC LIMIT 5"1user_16 4 1092
2user_15 4 1080
3user_14 4 1068
4user_13 4 1056
5user_12 4 1044Swap one file(...) for postgresql(...) or s3(...) and the same query joins a Postgres table to a Parquet file in object storage, with no download and no staging table.
Common use cases #
A few one-liners cover most of what people reach for clickhouse-local to do. Each is copy-paste runnable.
Ad-hoc file analysis. Aggregate a file without loading it anywhere first:
1clickhouse local -q "
2SELECT event, count() AS n, sum(amount) AS total
3FROM file('events.parquet') GROUP BY event ORDER BY n DESC"Format conversion (CSV ↔ Parquet). Read one format, write another in a single statement:
1clickhouse local -q "
2SELECT * FROM file('sales.csv')
3INTO OUTFILE 'sales.parquet' FORMAT Parquet"Query remote data with no download. Run an aggregation directly against a file served over HTTP, and clickhouse-local streams only what it needs:
1clickhouse local -q "
2SELECT count() AS rows
3FROM url('https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet')"11000000Log analysis. Treat a directory of log files as a table (see the glob example above) and slice it with SQL.
Lightweight ETL. Read from a source, transform with SQL, and write the result to a new file: a whole pipeline step with no infrastructure.
How fast is it? #
Fast enough that the binary startup is usually the visible cost. Aggregating a locally generated 10-million-row Parquet file (a GROUP BY with a count and an avg) completes in about 0.16 seconds of wall-clock time end to end (process start, Parquet read, aggregation, and exit), reproducibly across runs on an Apple-silicon laptop. The query engine itself reports the aggregation at roughly 20 milliseconds; the rest is reading the file and starting up.
Prefer Python? #
If you live in a notebook or a Python data pipeline, the same engine is available in-process as chDB: import chdb and run the identical ClickHouse SQL, getting a pandas DataFrame back. See what is chDB; it's the Python-native sibling of clickhouse-local with the same formats, table functions, and SQL.
Where to go next #
- How to query a Parquet file
- Run SQL on a CSV file
- Analyze log files with SQL
- Query Parquet on S3
- What is a Parquet file?
- Canonical clickhouse-local reference (docs)
Run it yourself: every command above is in a self-contained, runnable example at github.com/ClickHouse/examples/tree/main/local-analytics/clickhouse-local-intro. Run ./generate.sh then ./run.sh.
Prefer Python? → what is chDB