AvroConfluent
Input | Output | Alias |
---|---|---|
✔ | ✗ |
Description
AvroConfluent
supports decoding single-object, Avro-encoded Kafka messages serialized using the Confluent Schema Registry (or API-compatible services).
Each Avro message embeds a schema ID that ClickHouse automatically resolves by querying the configured schema registry. Once resolved, schemas are cached for optimal performance.
Data type mapping
The table below shows all data types supported by the Apache Avro format, and their corresponding ClickHouse data types in INSERT
and SELECT
queries.
Avro data type INSERT | ClickHouse data type | Avro data type SELECT |
---|---|---|
boolean , int , long , float , double | Int(8\16\32), UInt(8\16\32) | int |
boolean , int , long , float , double | Int64, UInt64 | long |
boolean , int , long , float , double | Float32 | float |
boolean , int , long , float , double | Float64 | double |
bytes , string , fixed , enum | String | bytes or string * |
bytes , string , fixed | FixedString(N) | fixed(N) |
enum | Enum(8\16) | enum |
array(T) | Array(T) | array(T) |
map(V, K) | Map(V, K) | map(string, K) |
union(null, T) , union(T, null) | Nullable(T) | union(null, T) |
union(T1, T2, …) ** | Variant(T1, T2, …) | union(T1, T2, …) ** |
null | Nullable(Nothing) | null |
int (date) *** | Date, Date32 | int (date) *** |
long (timestamp-millis) *** | DateTime64(3) | long (timestamp-millis) *** |
long (timestamp-micros) *** | DateTime64(6) | long (timestamp-micros) *** |
bytes (decimal) *** | DateTime64(N) | bytes (decimal) *** |
int | IPv4 | int |
fixed(16) | IPv6 | fixed(16) |
bytes (decimal) *** | Decimal(P, S) | bytes (decimal) *** |
string (uuid) *** | UUID | string (uuid) *** |
fixed(16) | Int128/UInt128 | fixed(16) |
fixed(32) | Int256/UInt256 | fixed(32) |
record | Tuple | record |
* bytes
is default, controlled by setting output_format_avro_string_column_pattern
** The Variant type implicitly accepts null
as a field value, so for example the Avro union(T1, T2, null)
will be converted to Variant(T1, T2)
.
As a result, when producing Avro from ClickHouse, we have to always include the null
type to the Avro union
type set as we don't know if any value is actually null
during the schema inference.
Unsupported Avro logical data types:
time-millis
time-micros
duration
Format settings
Setting | Description | Default |
---|---|---|
input_format_avro_allow_missing_fields | For Avro /AvroConfluent : whether to use a default value instead of throwing an error when a field is not found in the schema. | 0 |
input_format_avro_null_as_default | For Avro /AvroConfluent : whether to use a default value instead of throwing an error when inserting a null value into a non-nullable column. | 0 |
format_avro_schema_registry_url | For Avro /AvroConfluent : the Confluent Schema Registry URL. For basic authentication, URL-encoded credentials can be included directly in the URL path. |
Example
To read an Avro-encoded Kafka topic using the Kafka table engine, use the format_avro_schema_registry_url
setting to provide the URL of the Confluent Schema Registry.
If your schema registry requires basic authentication (e.g., if you're using Confluent Cloud), you can provide URL-encoded credentials in the format_avro_schema_registry_url
setting.
Troubleshooting
To monitor ingestion progress and debug errors with the Kafka consumer, you can query the system.kafka_consumers
system table. If your deployment has multiple replicas (e.g., ClickHouse Cloud), you must use the clusterAllReplicas
table function.
If you run into schema resolution issues, you can use kafkacat with clickhouse-local to troubleshoot: