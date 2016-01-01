Working with Avro, Arrow, and ORC data in ClickHouse

Apache has released multiple data formats actively used in analytics environments, including the popular Avro, Arrow, and Orc. ClickHouse supports importing and exporting data using any from that list.

ClickHouse supports reading and writing Apache Avro data files, which are widely used in Hadoop systems.

To import from an avro file, we should use Avro format in the INSERT statement:

With the file() function, we can also explore Avro files before actually importing data:

To export to Avro file:

Consider data types matching when importing or exporting Avro files. Use explicit type casting to convert when loading data from Avro files:

When Kafka messages use Avro format, ClickHouse can read such streams using AvroConfluent format and Kafka engine:

Another columnar format is Apache Arrow, also supported by ClickHouse for import and export. To import data from an Arrow file, we use the Arrow format:

Exporting to Arrow file works the same way:

Also, check data types matching to know if any should be converted manually.

The ArrowStream format can be used to work with Arrow streaming (used for in-memory processing). ClickHouse can read and write Arrow streams.

To demonstrate how ClickHouse can stream Arrow data, let's pipe it to the following python script (it reads input stream in Arrow streaming format and outputs the result as a Pandas table):

Now we can stream data from ClickHouse by piping its output to the script:

ClickHouse can read Arrow streams as well using the same ArrowStream format:

We've used arrow-stream as a possible source of Arrow streaming data.

Apache ORC format is a columnar storage format typically used for Hadoop. ClickHouse supports importing as well as exporting Orc data using ORC format:

Also, check data types matching as well as additional settings to tune export and import.

ClickHouse introduces support for many formats, both text, and binary, to cover various scenarios and platforms. Explore more formats and ways to work with them in the following articles:

And also check clickhouse-local - a portable full-featured tool to work on local/remote files without the need for Clickhouse server.