Formats for Input and Output Data
Formats for input and output data
ClickHouse supports most of the known text and binary data formats. This allows easy integration into almost any working data pipeline to leverage the benefits of ClickHouse.
Input formats
Input formats are used for:
- Parsing data provided to
INSERT
statements - Performing
SELECT
queries from file-backed tables such asFile
,URL
, orHDFS
- Reading dictionaries
Choosing the right input format is crucial for efficient data ingestion in ClickHouse. With over 70 supported formats, selecting the most performant option can significantly impact insert speed, CPU and memory usage, and overall system efficiency. To help navigate these choices, we benchmarked ingestion performance across formats, revealing key takeaways:
- The Native format is the most efficient input format, offering the best compression, lowest resource usage, and minimal server-side processing overhead.
- Compression is essential - LZ4 reduces data size with minimal CPU cost, while ZSTD offers higher compression at the expense of additional CPU usage.
- Pre-sorting has a moderate impact, as ClickHouse already sorts efficiently.
- Batching significantly improves efficiency - larger batches reduce insert overhead and improve throughput.
For a deep dive into the results and best practices, read the full benchmark analysis. For the full test results, explore the FastFormats online dashboard.
Output formats
Formats supported for output are used for:
- Arranging the results of a
SELECT
query - Performing
INSERT
operations into file-backed tables
Formats overview
The supported formats are:
You can control some format processing parameters with the ClickHouse settings. For more information read the Settings section.
TabSeparated
See TabSeparated
TabSeparatedRaw
See TabSeparatedRaw
TabSeparatedWithNames
TabSeparatedWithNamesAndTypes
See TabSeparatedWithNamesAndTypes
TabSeparatedRawWithNames
TabSeparatedRawWithNamesAndTypes
See TabSeparatedRawWithNamesAndTypes
Template
See Template
TemplateIgnoreSpaces
TSKV
See TSKV
CSV
See CSV
CSVWithNames
See CSVWithNames
CSVWithNamesAndTypes
CustomSeparated
See CustomSeparated
CustomSeparatedWithNames
CustomSeparatedWithNamesAndTypes
See CustomSeparatedWithNamesAndTypes
SQLInsert
See SQLInsert
JSON
See JSON
JSONStrings
See JSONStrings
JSONColumns
See JSONColumns
JSONColumnsWithMetadata
JSONAsString
See JSONAsString
JSONAsObject
See JSONAsObject
JSONCompact
See JSONCompact
JSONCompactStrings
JSONCompactColumns
JSONEachRow
See JSONEachRow
PrettyJSONEachRow
JSONStringsEachRow
JSONCompactEachRow
JSONCompactStringsEachRow
JSONEachRowWithProgress
JSONStringsEachRowWithProgress
See JSONStringsEachRowWithProgress
JSONCompactEachRowWithNames
See JSONCompactEachRowWithNames
JSONCompactEachRowWithNamesAndTypes
See JSONCompactEachRowWithNamesAndTypes
JSONCompactEachRowWithProgress
Similar to JSONEachRowWithProgress
but outputs row
events in a compact form, like in the JSONCompactEachRow
format.
JSONCompactStringsEachRowWithNames
See JSONCompactStringsEachRowWithNames
JSONCompactStringsEachRowWithNamesAndTypes
See JSONCompactStringsEachRowWithNamesAndTypes
JSONObjectEachRow
JSON Formats Settings
BSONEachRow
See BSONEachRow
Native
See Native
Null
See Null
Pretty
See Pretty
PrettyNoEscapes
See PrettyNoEscapes
PrettyMonoBlock
See PrettyMonoBlock
PrettyNoEscapesMonoBlock
PrettyCompact
See PrettyCompact
PrettyCompactNoEscapes
PrettyCompactMonoBlock
PrettyCompactNoEscapesMonoBlock
See PrettyCompactNoEscapesMonoBlock
PrettySpace
See PrettySpace
PrettySpaceNoEscapes
PrettySpaceMonoBlock
PrettySpaceNoEscapesMonoBlock
See PrettySpaceNoEscapesMonoBlock
RowBinary
See RowBinary
RowBinaryWithNames
RowBinaryWithNamesAndTypes
See RowBinaryWithNamesAndTypes
RowBinaryWithDefaults
Values
See Values
Vertical
See Vertical
XML
See XML
CapnProto
See CapnProto
Prometheus
See Prometheus
Protobuf
See Protobuf
ProtobufSingle
See ProtobufSingle
ProtobufList
See ProtobufList
Avro
See Avro
AvroConfluent
See AvroConfluent
Parquet
See Parquet
ParquetMetadata
See ParquetMetadata
Arrow
See Arrow
ArrowStream
See ArrowStream
ORC
See ORC
One
See One
Npy
See Npy
LineAsString
See:
Regexp
See Regexp
RawBLOB
See RawBLOB
Markdown
See Markdown
MsgPack
See MsgPack
MySQLDump
See MySQLDump
DWARF
See Dwarf
Form
See Form
Format Schema
The file name containing the format schema is set by the setting format_schema
.
It's required to set this setting when it is used one of the formats Cap'n Proto
and Protobuf
.
The format schema is a combination of a file name and the name of a message type in this file, delimited by a colon,
e.g. schemafile.proto:MessageType
.
If the file has the standard extension for the format (for example, .proto
for Protobuf
),
it can be omitted and in this case, the format schema looks like schemafile:MessageType
.
If you input or output data via the client in interactive mode, the file name specified in the format schema can contain an absolute path or a path relative to the current directory on the client. If you use the client in the batch mode, the path to the schema must be relative due to security reasons.
If you input or output data via the HTTP interface the file name specified in the format schema should be located in the directory specified in format_schema_path in the server configuration.
Skipping Errors
Some formats such as CSV
, TabSeparated
, TSKV
, JSONEachRow
, Template
, CustomSeparated
and Protobuf
can skip broken row if parsing error occurred and continue parsing from the beginning of next row. See input_format_allow_errors_num and
input_format_allow_errors_ratio settings.
Limitations:
- In case of parsing error
JSONEachRow
skips all data until the new line (or EOF), so rows must be delimited by\n
to count errors correctly. Template
andCustomSeparated
use delimiter after the last column and delimiter between rows to find the beginning of next row, so skipping errors works only if at least one of them is not empty.