Protobuf
Description
Protobuf - is a Protocol Buffers format.
This format requires an external format schema. The schema is cached between queries.
ClickHouse supports both proto2
and proto3
syntaxes. Repeated/optional/required fields are supported.
Example Usage
Basic Examples
Usage examples:
SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:MessageType'
cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table SETTINGS format_schema='schemafile:MessageType' FORMAT Protobuf"
where the file schemafile.proto
looks like this:
syntax = "proto3";
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
To find the correspondence between table columns and fields of Protocol Buffers’ message type ClickHouse compares their names.
This comparison is case-insensitive and the characters _
(underscore) and .
(dot) are considered as equal.
If types of a column and a field of Protocol Buffers’ message are different the necessary conversion is applied.
Nested messages are supported. For example, for the field z
in the following message type
message MessageType {
message XType {
message YType {
int32 z;
};
repeated YType y;
};
XType x;
};
ClickHouse tries to find a column named x.y.z
(or x_y_z
or X.y_Z
and so on).
Nested messages are suitable to input or output a nested data structures.
Default values defined in a protobuf schema like this
syntax = "proto2";
message MessageType {
optional int32 result_per_page = 3 [default = 10];
}
are not applied; the table defaults are used instead of them.
ClickHouse inputs and outputs protobuf messages in the length-delimited
format.
It means before every message should be written its length as a varint.
See also how to read/write length-delimited protobuf messages in popular languages.
Using autogenerated schema
If you don't have an external Protobuf schema for your data, you can still output/input data in Protobuf format using autogenerated schema. For example:
SELECT * FROM test.hits format Protobuf SETTINGS format_protobuf_use_autogenerated_schema=1
In this case ClickHouse will autogenerate Protobuf schema according to the table structure using function structureToProtobufSchema and will use this schema to serialize data in Protobuf format.
You can also read Protobuf file with autogenerated schema (in this case the file must be created using the same schema):
$ cat hits.bin | clickhouse-client --query "INSERT INTO test.hits SETTINGS format_protobuf_use_autogenerated_schema=1 FORMAT Protobuf"
The setting format_protobuf_use_autogenerated_schema is enabled by default and applies if format_schema is not set.
You can also save autogenerated schema in the file during input/output using setting output_format_schema. For example:
SELECT * FROM test.hits format Protobuf SETTINGS format_protobuf_use_autogenerated_schema=1, output_format_schema='path/to/schema/schema.proto'
In this case autogenerated Protobuf schema will be saved in file path/to/schema/schema.capnp
.
Drop Protobuf cache
To reload Protobuf schema loaded from format_schema_path use SYSTEM DROP ... FORMAT CACHE statement.
SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf