Parquet

Apache Parquet is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.

The table below shows supported data types and how they match ClickHouse data types in INSERT and SELECT queries.

Parquet data type ( INSERT ) ClickHouse data type Parquet data type ( SELECT ) BOOL Bool BOOL UINT8 , BOOL UInt8 UINT8 INT8 Int8/Enum8 INT8 UINT16 UInt16 UINT16 INT16 Int16/Enum16 INT16 UINT32 UInt32 UINT32 INT32 Int32 INT32 UINT64 UInt64 UINT64 INT64 Int64 INT64 FLOAT Float32 FLOAT DOUBLE Float64 DOUBLE DATE Date32 DATE TIME (ms) DateTime UINT32 TIMESTAMP , TIME (us, ns) DateTime64 TIMESTAMP STRING , BINARY String BINARY STRING , BINARY , FIXED_LENGTH_BYTE_ARRAY FixedString FIXED_LENGTH_BYTE_ARRAY DECIMAL Decimal DECIMAL LIST Array LIST STRUCT Tuple STRUCT MAP Map MAP UINT32 IPv4 UINT32 FIXED_LENGTH_BYTE_ARRAY , BINARY IPv6 FIXED_LENGTH_BYTE_ARRAY FIXED_LENGTH_BYTE_ARRAY , BINARY Int128/UInt128/Int256/UInt256 FIXED_LENGTH_BYTE_ARRAY

Arrays can be nested and can have a value of the Nullable type as an argument. Tuple and Map types also can be nested.

Unsupported Parquet data types: FIXED_SIZE_BINARY , JSON , UUID , ENUM .

Data types of ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then cast the data to that data type which is set for the ClickHouse table column.

You can insert Parquet data from a file into ClickHouse table by the following command:

$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"



You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:

$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}



To exchange data with Hadoop, you can use HDFS table engine.