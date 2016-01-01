clickhouse-local
Related Content
When to use clickhouse-local vs. ClickHouse
clickhouse-local is an easy-to-use version of ClickHouse that is ideal for developers who need to perform fast processing on local and remote files using SQL without having to install a full database server. With
clickhouse-local, developers can use SQL commands (using the ClickHouse SQL dialect) directly from the command line, providing a simple and efficient way to access ClickHouse features without the need for a full ClickHouse installation. One of the main benefits of
clickhouse-local is that it is already included when installing clickhouse-client. This means that developers can get started with
clickhouse-local quickly, without the need for a complex installation process.
While
clickhouse-local is a great tool for development and testing purposes, and for processing files, it is not suitable for serving end users or applications. In these scenarios, it is recommended to use the open-source ClickHouse. ClickHouse is a powerful OLAP database that is designed to handle large-scale analytical workloads. It provides fast and efficient processing of complex queries on large datasets, making it ideal for use in production environments where high-performance is critical. Additionally, ClickHouse offers a wide range of features such as replication, sharding, and high availability, which are essential for scaling up to handle large datasets and serving applications. If you need to handle larger datasets or serve end users or applications, we recommend using open-source ClickHouse instead of
clickhouse-local.
Please read the docs below that show example use cases for
clickhouse-local, such as querying local file or reading a parquet file in S3.
Download clickhouse-local
clickhouse-local is executed using the same
clickhouse binary that runs the ClickHouse server and
clickhouse-client. The easiest way to download the latest version is with the following command:
The binary you just downloaded can run all sorts of ClickHouse tools and utilities. If you want to run ClickHouse as a database server, check out the Quick Start.
Query data in a file using SQL
A common use of
clickhouse-local is to run ad-hoc queries on files: where you don't have to insert the data into a table.
clickhouse-local can stream the data from a file into a temporary table and execute your SQL.
If the file is sitting on the same machine as
clickhouse-local, you can simply specify the file to load. The following
reviews.tsv file contains a sampling of Amazon product reviews:
This command is a shortcut of:
ClickHouse knows the file uses a tab-separated format from filename extension. If you need to explicitly specify the format, simply add one of the many ClickHouse input formats:
The
file table function creates a table, and you can use
DESCRIBE to see the inferred schema:
You are allowed to use globs in file name (See glob substitutions).
Examples:
Let's find a product with the highest rating:
Query data in a Parquet file in AWS S3
If you have a file in S3, use
clickhouse-local and the
s3 table function to query the file in place (without inserting the data into a ClickHouse table). We have a file named
house_0.parquet in a public bucket that contains home prices of property sold in the United Kingdom. Let's see how many rows it has:
The file has 2.7M rows:
It's always useful to see what the inferred schema that ClickHouse determines from the file:
Let's see what the most expensive neighborhoods are:
When you are ready to insert your files into ClickHouse, startup a ClickHouse server and insert the results of your
file and
s3 table functions into a
MergeTree table. View the Quick Start for more details.
Format Conversions
You can use
clickhouse-local for converting data between different formats. Example:
Formats are auto-detected from file extensions:
As a shortcut, you can write it using the
--copy argument:
Usage
By default
clickhouse-local has access to data of a ClickHouse server on the same host, and it does not depend on the server's configuration. It also supports loading server configuration using
--config-file argument. For temporary data, a unique temporary data directory is created by default.
Basic usage (Linux):
Basic usage (Mac):
clickhouse-local is also supported on Windows through WSL2.
Arguments:
-S,
--structure— table structure for input data.
--input-format— input format,
TSVby default.
-F,
--file— path to data,
stdinby default.
-q,
--query— queries to execute with
;as delimiter.
--querycan be specified multiple times, e.g.
--query "SELECT 1" --query "SELECT 2". Cannot be used simultaneously with
--queries-file.
--queries-file- file path with queries to execute.
--queries-filecan be specified multiple times, e.g.
--query queries1.sql --query queries2.sql. Cannot be used simultaneously with
--query.
--multiquery, -n– If specified, multiple queries separated by semicolons can be listed after the
--queryoption. For convenience, it is also possible to omit
--queryand pass the queries directly after
--multiquery.
-N,
--table— table name where to put output data,
tableby default.
-f,
--format,
--output-format— output format,
TSVby default.
-d,
--database— default database,
_localby default.
--stacktrace— whether to dump debug output in case of exception.
--echo— print query before execution.
--verbose— more details on query execution.
--logger.console— Log to console.
--logger.log— Log file name.
--logger.level— Log level.
--ignore-error— do not stop processing if a query failed.
-c,
--config-file— path to configuration file in same format as for ClickHouse server, by default the configuration empty.
--no-system-tables— do not attach system tables.
--help— arguments references for
clickhouse-local.
-V,
--version— print version information and exit.
Also, there are arguments for each ClickHouse configuration variable which are more commonly used instead of
--config-file.
Examples
Previous example is the same as:
You don't have to use
stdin or
--file argument, and can open any number of files using the
file table function:
Now let's output memory user for each Unix user:
Query:
Result: