Skip to main content
Skip to main content

Parquet

Description

Apache Parquet is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.

Data Types Matching

The table below shows supported data types and how they match ClickHouse data types in INSERT and SELECT queries.

Parquet data type (INSERT)ClickHouse data typeParquet data type (SELECT)
BOOLBoolBOOL
UINT8, BOOLUInt8UINT8
INT8Int8/Enum8INT8
UINT16UInt16UINT16
INT16Int16/Enum16INT16
UINT32UInt32UINT32
INT32Int32INT32
UINT64UInt64UINT64
INT64Int64INT64
FLOATFloat32FLOAT
DOUBLEFloat64DOUBLE
DATEDate32DATE
TIME (ms)DateTimeUINT32
TIMESTAMP, TIME (us, ns)DateTime64TIMESTAMP
STRING, BINARYStringBINARY
STRING, BINARY, FIXED_LENGTH_BYTE_ARRAYFixedStringFIXED_LENGTH_BYTE_ARRAY
DECIMALDecimalDECIMAL
LISTArrayLIST
STRUCTTupleSTRUCT
MAPMapMAP
UINT32IPv4UINT32
FIXED_LENGTH_BYTE_ARRAY, BINARYIPv6FIXED_LENGTH_BYTE_ARRAY
FIXED_LENGTH_BYTE_ARRAY, BINARYInt128/UInt128/Int256/UInt256FIXED_LENGTH_BYTE_ARRAY

Arrays can be nested and can have a value of the Nullable type as an argument. Tuple and Map types also can be nested.

Unsupported Parquet data types: FIXED_SIZE_BINARY, JSON, UUID, ENUM.

Data types of ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then cast the data to that data type which is set for the ClickHouse table column.

Example Usage

Inserting and Selecting Data

You can insert Parquet data from a file into ClickHouse table by the following command:

$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"

You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:

$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}

To exchange data with Hadoop, you can use HDFS table engine.

Format Settings