ClickHouse Rust Client
The official Rust client for connecting to ClickHouse, originally developed by Paul Loyd. The client source code is available in the GitHub repository.
Overview
- Uses
serdefor encoding/decoding rows.
- Supports
serdeattributes:
skip_serializing,
skip_deserializing,
rename.
- Uses
RowBinaryformat over the HTTP transport.
- There are plans to switch to
Nativeover TCP.
- There are plans to switch to
- Supports TLS (via
native-tlsand
rustls-tlsfeatures).
- Supports compression and decompression (LZ4).
- Provides APIs for selecting or inserting data, executing DDLs, and client-side batching.
- Provides convenient mocks for unit testing.
Installation
To use the crate, add the following to your
Cargo.toml:
See also: crates.io page.
Cargo features
lz4(enabled by default) — enables
Compression::Lz4and
Compression::Lz4Hc(_)variants. If enabled,
Compression::Lz4is used by default for all queries except for
WATCH.
native-tls— supports urls with the
HTTPSschema via
hyper-tls, which links against OpenSSL.
rustls-tls— supports urls with the
HTTPSschema via
hyper-rustls, which does not link against OpenSSL.
inserter— enables
client.inserter().
test-util— adds mocks. See the example. Use it only in
dev-dependencies.
watch— enables
client.watchfunctionality. See the corresponding section for details.
uuid— adds
serde::uuidto work with uuid crate.
time— adds
serde::timeto work with time crate.
When connecting to ClickHouse via an
HTTPS url, either the
native-tls or
rustls-tls feature should be enabled.
If both are enabled, the
rustls-tls feature will take precedence.
ClickHouse versions compatibility
The client is compatible with the LTS or newer versions of ClickHouse, as well as ClickHouse Cloud.
ClickHouse server older than v22.6 handles RowBinary incorrectly in some rare cases.
You could use v0.11+ and enable
wa-37420 feature to solve this problem. Note: this feature should not be used with newer ClickHouse versions.
Examples
We aim to cover various scenarios of client usage with the examples in the client repository. The overview is available in the examples README.
If something is unclear or missing from the examples or from the following documentation, feel free to contact us.
Usage
ch2rs crate is useful to generate a row type from ClickHouse.
Creating a client instance
Reuse created clients or clone them in order to reuse the underlying hyper connection pool.
HTTPS or ClickHouse Cloud connection
HTTPS works with either
rustls-tls or
native-tls cargo features.
Then, create client as usual. In this example, the environment variables are used to store the connection details:
The URL should include both protocol and port, e.g.
https://instance.clickhouse.cloud:8443.
See also:
- HTTPS with ClickHouse Cloud example in the client repo. This should be applicable to on-premise HTTPS connections as well.
Selecting rows
- Placeholder
?fieldsis replaced with
no, name(fields of
Row).
- Placeholder
?is replaced with values in following
bind()calls.
- Convenient
fetch_one::<Row>()and
fetch_all::<Row>()methods can be used to get a first row or all rows, correspondingly.
sql::Identifiercan be used to bind table names.
NB: as the entire response is streamed, cursors can return an error even after producing some rows. If this happens in your use case, you could try
query(...).with_option("wait_end_of_query", "1") in order to enable response buffering on the server-side. More details. The
buffer_size option can be useful, too.
Use
wait_end_of_query with caution when selecting rows, as it can will to higher memory consumption on the server side and will likely decrease the overall performance.
Inserting rows
- If
end()isn't called, the
INSERTis aborted.
- Rows are being sent progressively as a stream to spread the network load.
- ClickHouse inserts batches atomically only if all rows fit in the same partition and their number is less
max_insert_block_size.
Async insert (server-side batching)
You could use ClickHouse asynchronous inserts to avoid client-side batching of the incoming data. This can be done by simply providing the
async_insert option to the
insert method (or even to the
Client instance itself, so that it will affect all the
insert calls).
See also:
- Async insert example in the client repo.
Inserter feature (client-side batching)
Requires the
inserter cargo feature.
Inserterends the active insert in
commit()if any of the thresholds (
max_bytes,
max_rows,
period) are reached.
- The interval between ending active
INSERTs can be biased by using
with_period_biasto avoid load spikes by parallel inserters.
Inserter::time_left()can be used to detect when the current period ends. Call
Inserter::commit()again to check limits if your stream emits items rarely.
- Time thresholds implemented by using quanta crate to speed the
inserterup. Not used if
test-utilis enabled (thus, time can be managed by
tokio::time::advance()in custom tests).
- All rows between
commit()calls are inserted in the same
INSERTstatement.
Do not forget to flush if you want to terminate/finalize inserting:
Executing DDLs
With a single-node deployment, it is enough to execute DDLs like this:
However, on clustered deployments with a load-balancer or ClickHouse Cloud, it is recommended to wait for the DDL to be applied on all the replicas, using the
wait_end_of_query option. This can be done like this:
ClickHouse settings
You can apply various ClickHouse settings using the
with_option method. For example:
Besides
query, it works similarly with
insert and
inserter methods; additionally, the same method can be called on the
Client instance to set global settings for all queries.
Query ID
Using
.with_option, you can set the
query_id option to identify queries in the ClickHouse query log.
Besides
query, it works similarly with
insert and
inserter methods.
If you set
query_id manually, make sure that it is unique. UUIDs are a good choice for this.
See also: query_id example in the client repo.
Session ID
Similarly to
query_id, you can set the
session_id to execute the statements in the same session.
session_id can be set either globally on the client level, or per
query,
insert, or
inserter call.
With clustered deployments, due to lack of "sticky sessions", you need to be connected to a particular cluster node in order to properly utilize this feature, cause, for example, a round-robin load-balancer will not guarantee that the consequent requests will be processed by the same ClickHouse node.
See also: session_id example in the client repo.
Custom HTTP headers
If you are using proxy authentication or need to pass custom headers, you can do it like this:
See also: custom HTTP headers example in the client repo.
Custom HTTP client
This could be useful for tweaking the underlying HTTP connection pool settings.
This example relies on the legacy Hyper API and is a subject to change in the future.
See also: custom HTTP client example in the client repo.
Data Types
See also the additional examples:
(U)Int(8|16|32|64|128)maps to/from corresponding
(u|i)(8|16|32|64|128)types or newtypes around them.
(U)Int256are not supported directly, but there is a workaround for it.
Float(32|64)maps to/from corresponding
f(32|64)or newtypes around them.
Decimal(32|64|128)maps to/from corresponding
i(32|64|128)or newtypes around them. It's more convenient to use
fixnumor another implementation of signed fixed-point numbers.
Booleanmaps to/from
boolor newtypes around it.
Stringmaps to/from any string or bytes types, e.g.
&str,
&[u8],
String,
Vec<u8>or
SmartString. New types are also supported. To store bytes, consider using
serde_bytes, because it's more efficient.
FixedString(N)is supported as an array of bytes, e.g.
[u8; N].
Enum(8|16)are supported using
serde_repr.
UUIDmaps to/from
uuid::Uuidby using
serde::uuid. Requires the
uuidfeature.
IPv6maps to/from
std::net::Ipv6Addr.
IPv4maps to/from
std::net::Ipv4Addrby using
serde::ipv4.
Datemaps to/from
u16or a newtype around it and represents a number of days elapsed since
1970-01-01. Also,
time::Dateis supported by using
serde::time::date, that requires the
timefeature.
Date32maps to/from
i32or a newtype around it and represents a number of days elapsed since
1970-01-01. Also,
time::Dateis supported by using
serde::time::date32, that requires the
timefeature.
DateTimemaps to/from
u32or a newtype around it and represents a number of seconds elapsed since UNIX epoch. Also,
time::OffsetDateTimeis supported by using
serde::time::datetime, that requires the
timefeature.
DateTime64(_)maps to/from
i32or a newtype around it and represents a time elapsed since UNIX epoch. Also,
time::OffsetDateTimeis supported by using
serde::time::datetime64::*, that requires the
timefeature.
Tuple(A, B, ...)maps to/from
(A, B, ...)or a newtype around it.
Array(_)maps to/from any slice, e.g.
Vec<_>,
&[_]. New types are also supported.
Map(K, V)behaves like
Array((K, V)).
LowCardinality(_)is supported seamlessly.
Nullable(_)maps to/from
Option<_>. For
clickhouse::serde::*helpers add
::option.
Nestedis supported by providing multiple arrays with renaming.
Geotypes are supported.
Pointbehaves like a tuple
(f64, f64), and the rest of the types are just slices of points.
Variant,
Dynamic, (new)
JSONdata types aren't supported yet.
Mocking
The crate provides utils for mocking CH server and testing DDL,
SELECT,
INSERT and
WATCH queries. The functionality can be enabled with the
test-util feature. Use it only as a dev-dependency.
See the example.
Troubleshooting
CANNOT_READ_ALL_DATA
The most common cause for the
CANNOT_READ_ALL_DATA error is that the row definition on the application side does match that in ClickHouse.
Consider the following table:
Then, if
EventLog is defined on the application side with mismatching types, e.g.:
When inserting the data, the following error can occur:
In this example, this is fixed by the correct definition of the
EventLog struct:
Known limitations
Variant,
Dynamic, (new)
JSONdata types aren't supported yet.
- Server-side parameter binding is not supported yet; see this issue for tracking.
