ORC

Input	Output	Alias
✔	✔

Description

Apache ORC is a columnar storage format widely used in the Hadoop ecosystem.

The table below compares supported ORC data types and their corresponding ClickHouse data types in INSERT and SELECT queries.

Other types are not supported.
Arrays can be nested and can have a value of the Nullable type as an argument. Tuple and Map types also can be nested.
The data types of ClickHouse table columns do not have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to the data type set for the ClickHouse table column.

Using an ORC file with the following data, named as football.orc:

Insert the data:

Read data using the ORC format:

Tip

ORC is a binary format that does not display in a human-readable form on the terminal. Use the INTO OUTFILE to output ORC files.

Setting	Description	Default
`output_format_arrow_string_as_string`	Use Arrow String type instead of Binary for String columns.	`false`
`output_format_orc_compression_method`	Compression method used in output ORC format. Default value	`none`
`input_format_arrow_case_insensitive_column_matching`	Ignore case when matching Arrow columns with ClickHouse columns.	`false`
`input_format_arrow_allow_missing_columns`	Allow missing columns while reading Arrow data.	`false`
`input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference`	Allow skipping columns with unsupported types while schema inference for Arrow format.	`false`

To exchange data with Hadoop, you can use HDFS table engine.