What is clickhouse-local? Run SQL on any file or data source, no server

Al Brown
Last updated: Jun 8, 2026

clickhouse-local is a small, fast, standalone build of ClickHouse that runs from a single binary, letting you run full ClickHouse SQL over local and remote files (and external databases) without installing or starting a server. You point it at a file and query it in place:

1clickhouse local -q "SELECT event, count() FROM file('events.parquet') GROUP BY event"

There's no import step, no table to create, and no server process. It starts, runs the query, prints the result, and exits.

1click     250
2login     250
3logout    250
4purchase  250

Install and invoke #

clickhouse-local ships inside the single clickhouse binary. The quickest way to get it is with clickhousectl, the ClickHouse CLI for local and cloud:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

That puts a self-contained clickhouse binary on your PATH. You then invoke local mode as clickhouse local:

1clickhouse local -q "SELECT version()"
126.6.1.117

There is no separate clickhouse-local binary to install; local is a subcommand of clickhouse. See the canonical reference in the docs for every flag.

How it works #

clickhouse-local is the same engine as ClickHouse server and ClickHouse Cloud, packaged to run as a one-shot process instead of a long-lived service. That has three consequences worth internalising.

No server, no state. There is no daemon to start, no port to bind, no data directory to manage. Each invocation spins up the engine, runs your query against whatever you point it at, and tears down. This makes it ideal for scripts, CI jobs, and ad-hoc work where standing up a database would be overkill.

The same SQL everywhere. clickhouse-local speaks the full ClickHouse SQL dialect: the same functions, the same aggregate combinators, the same window functions. A query you write against a Parquet file on your laptop runs unchanged against a table on a ClickHouse server or in ClickHouse Cloud. You prototype locally and ship to production with no query rewrite.

Data stays in place via table functions. You don't load data into clickhouse-local; you reference it where it lives using a table function. file() reads a local file, s3() reads from object storage, url() reads over HTTP, and so on. The engine reads only what the query needs.

Reads 70+ formats, schema auto-inferred #

clickhouse-local reads more than 70 input formats, including Parquet, CSV, TSV, JSON and NDJSON, ORC, Avro, Arrow, MessagePack, npy, BSON, Protobuf, and raw log lines. For self-describing formats like Parquet, the schema is inferred automatically. DESCRIBE shows you what it found:

1clickhouse local -q "DESCRIBE file('events.parquet')"
1id      UInt64
2event   String
3day     Date32
4amount  UInt8

It read the Parquet metadata and mapped each column to a ClickHouse type, with no CREATE TABLE and no schema file. The same inference works for CSV, JSON, and the rest; for delimited text without a header it falls back to positional c1, c2, ... columns you can name in the query.

Compression and globs come for free #

Compression is handled transparently by file extension. A .gz, .zst, .lz4, or .bz2 file is decompressed on the fly, so you query it exactly like an uncompressed file:

1clickhouse local -q "SELECT count() FROM file('logs/app-2026-06-02.log.gz', LineAsString)"
1300

And a glob pattern reads many files as one logical table. Here we scan every .log file in a directory and count HTTP status codes (widen the pattern to app-* and it picks up the .gz too, decompressing as it goes):

1clickhouse local -q "
2SELECT splitByChar(' ', line)[5] AS status, count() AS n
3FROM file('logs/app-*.log', LineAsString)
4GROUP BY status ORDER BY n DESC"
1200  300
2500  150
3404  150

Query data where it lives: files, cloud, databases #

The real reach of clickhouse-local is the set of table functions it exposes. Each one points the engine at a different source, and you can mix them in a single query.

SourceTable functionHow-to
Local Parquet filefile('data.parquet')Query a Parquet file
CSV filefile('data.csv')Run SQL on a CSV file
TSV filefile('data.tsv')Query a TSV file
Log files / raw linesfile('*.log', LineAsString)Analyze log files with SQL
Amazon S3s3('https://...')Query Parquet on S3
HTTP / remote URLurl('https://...')(below)
Google Cloud Storagegcs('https://...')
Azure Blob StorageazureBlobStorage(...)
PostgreSQLpostgresql(...)
MySQLmysql(...)
MongoDBmongodb(...)
SQLitesqlite(...)
Apache Icebergiceberg(...)
Delta LakedeltaLake(...)

Because these are just table functions, you can JOIN across sources in one query. For example, enrich rows from a CSV with rows from a Parquet file:

1clickhouse local -q "
2SELECT s.user AS user, count() AS sales, sum(s.spend) AS spend
3FROM file('sales.csv') AS s
4JOIN file('events.parquet') AS e ON toUInt64(s.id) = e.id
5GROUP BY user ORDER BY spend DESC LIMIT 5"
1user_16  4  1092
2user_15  4  1080
3user_14  4  1068
4user_13  4  1056
5user_12  4  1044

Swap one file(...) for postgresql(...) or s3(...) and the same query joins a Postgres table to a Parquet file in object storage, with no download and no staging table.

Common use cases #

A few one-liners cover most of what people reach for clickhouse-local to do. Each is copy-paste runnable.

Ad-hoc file analysis. Aggregate a file without loading it anywhere first:

1clickhouse local -q "
2SELECT event, count() AS n, sum(amount) AS total
3FROM file('events.parquet') GROUP BY event ORDER BY n DESC"

Format conversion (CSV ↔ Parquet). Read one format, write another in a single statement:

1clickhouse local -q "
2SELECT * FROM file('sales.csv')
3INTO OUTFILE 'sales.parquet' FORMAT Parquet"

Query remote data with no download. Run an aggregation directly against a file served over HTTP, and clickhouse-local streams only what it needs:

1clickhouse local -q "
2SELECT count() AS rows
3FROM url('https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet')"
11000000

Log analysis. Treat a directory of log files as a table (see the glob example above) and slice it with SQL.

Lightweight ETL. Read from a source, transform with SQL, and write the result to a new file: a whole pipeline step with no infrastructure.

How fast is it? #

Fast enough that the binary startup is usually the visible cost. Aggregating a locally generated 10-million-row Parquet file (a GROUP BY with a count and an avg) completes in about 0.16 seconds of wall-clock time end to end (process start, Parquet read, aggregation, and exit), reproducibly across runs on an Apple-silicon laptop. The query engine itself reports the aggregation at roughly 20 milliseconds; the rest is reading the file and starting up.

Prefer Python? #

If you live in a notebook or a Python data pipeline, the same engine is available in-process as chDB: import chdb and run the identical ClickHouse SQL, getting a pandas DataFrame back. See what is chDB; it's the Python-native sibling of clickhouse-local with the same formats, table functions, and SQL.

Where to go next #

Run it yourself: every command above is in a self-contained, runnable example at github.com/ClickHouse/examples/tree/main/local-analytics/clickhouse-local-intro. Run ./generate.sh then ./run.sh.

Prefer Python? → what is chDB

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...