Skip to main content
Skip to main content

Avro

Description

Apache Avro is a row-oriented data serialization framework developed within Apache’s Hadoop project. ClickHouse Avro format supports reading and writing Avro data files.

Data Types Matching

The table below shows supported data types and how they match ClickHouse data types in INSERT and SELECT queries.

Avro data type INSERTClickHouse data typeAvro data type SELECT
boolean, int, long, float, doubleInt(8\16\32), UInt(8\16\32)int
boolean, int, long, float, doubleInt64, UInt64long
boolean, int, long, float, doubleFloat32float
boolean, int, long, float, doubleFloat64double
bytes, string, fixed, enumStringbytes or string *
bytes, string, fixedFixedString(N)fixed(N)
enumEnum(8\16)enum
array(T)Array(T)array(T)
map(V, K)Map(V, K)map(string, K)
union(null, T), union(T, null)Nullable(T)union(null, T)
union(T1, T2, …) **Variant(T1, T2, …)union(T1, T2, …) **
nullNullable(Nothing)null
int (date) ***Date, Date32int (date) ***
long (timestamp-millis) ***DateTime64(3)long (timestamp-millis) ***
long (timestamp-micros) ***DateTime64(6)long (timestamp-micros) ***
bytes (decimal) ***DateTime64(N)bytes (decimal) ***
intIPv4int
fixed(16)IPv6fixed(16)
bytes (decimal) ***Decimal(P, S)bytes (decimal) ***
string (uuid) ***UUIDstring (uuid) ***
fixed(16)Int128/UInt128fixed(16)
fixed(32)Int256/UInt256fixed(32)
recordTuplerecord

* bytes is default, controlled by output_format_avro_string_column_pattern

** Variant type implicitly accepts null as a field value, so for example the Avro union(T1, T2, null) will be converted to Variant(T1, T2). As a result, when producing Avro from ClickHouse, we have to always include the null type to the Avro union type set as we don't know if any value is actually null during the schema inference.

*** Avro logical types

Unsupported Avro logical data types: time-millis, time-micros, duration

Example Usage

Inserting Data

To insert data from an Avro file into ClickHouse table:

$ cat file.avro | clickhouse-client --query="INSERT INTO {some_table} FORMAT Avro"

The root schema of input Avro file must be of record type.

To find the correspondence between table columns and fields of Avro schema ClickHouse compares their names. This comparison is case-sensitive. Unused fields are skipped.

Data types of ClickHouse table columns can differ from the corresponding fields of the Avro data inserted. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to corresponding column type.

While importing data, when field is not found in schema and setting input_format_avro_allow_missing_fields is enabled, default value will be used instead of error.

Selecting Data

To select data from ClickHouse table into an Avro file:

$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Avro" > file.avro

Column names must:

  • start with [A-Za-z_]
  • subsequently contain only [A-Za-z0-9_]

Output Avro file compression and sync interval can be configured with output_format_avro_codec and output_format_avro_sync_interval respectively.

Example Data

Using the ClickHouse DESCRIBE function, you can quickly view the inferred format of an Avro file like the following example. This example includes the URL of a publicly accessible Avro file in the ClickHouse S3 public bucket:

DESCRIBE url('https://clickhouse-public-datasets.s3.eu-central-1.amazonaws.com/hits.avro','Avro);
┌─name───────────────────────┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ WatchID │ Int64 │ │ │ │ │ │
│ JavaEnable │ Int32 │ │ │ │ │ │
│ Title │ String │ │ │ │ │ │
│ GoodEvent │ Int32 │ │ │ │ │ │
│ EventTime │ Int32 │ │ │ │ │ │
│ EventDate │ Date32 │ │ │ │ │ │
│ CounterID │ Int32 │ │ │ │ │ │
│ ClientIP │ Int32 │ │ │ │ │ │
│ ClientIP6 │ FixedString(16) │ │ │ │ │ │
│ RegionID │ Int32 │ │ │ │ │ │
...
│ IslandID │ FixedString(16) │ │ │ │ │ │
│ RequestNum │ Int32 │ │ │ │ │ │
│ RequestTry │ Int32 │ │ │ │ │ │
└────────────────────────────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

Format Settings