Integrating Kafka with ClickHouse

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. ClickHouse provides multiple options to read from and write to Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).

Available options

Choosing the right option for your use case depends on multiple factors, including your ClickHouse deployment type, data flow direction and operational requirements.

Option	Deployment type	Fully managed	Kafka to ClickHouse	ClickHouse to Kafka
ClickPipes for Kafka	Cloud, BYOC (coming soon!)	✅	✅
Kafka Connect Sink	Cloud, BYOC, Self-hosted		✅
Kafka table engine	Cloud, BYOC, Self-hosted		✅	✅

For a more detailed comparison between these options, see Choosing an option.

ClickPipes for Kafka

ClickPipes is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Because it is fully managed and purpose-built for production workloads, ClickPipes significantly lowers infrastructure and operational costs, removing the need for external data streaming and ETL tools.

Tip

This is the recommended option if you're a ClickHouse Cloud user. ClickPipes is fully managed and purpose-built to deliver the best performance in Cloud environments.

Main features

Optimized for ClickHouse Cloud, delivering blazing-fast performance
Horizontal and vertical scalability for high-throughput workloads
Built-in fault tolerance with configurable replicas and automatic retries
Deployment and management via ClickHouse Cloud UI, Open API, or Terraform
Enterprise-grade security with support for cloud-native authorization (IAM) and private connectivity (PrivateLink)
Supports a wide range of data sources, including Confluent Cloud, Amazon MSK, Redpanda Cloud, and Azure Event Hubs
Supports most common serialization formats (JSON, Avro, Protobuf coming soon!)

Getting started

To get started using ClickPipes for Kafka, see the reference documentation or navigate to the Data Sources tab in the ClickHouse Cloud UI.

Kafka Connect Sink

Kafka Connect is an open-source framework that works as a centralized data hub for simple data integration between Kafka and other data systems. The ClickHouse Kafka Connect Sink connector provides a scalable and highly-configurable option to read data from Apache Kafka and other Kafka API-compatible brokers.

Tip

This is the recommended option if you prefer high configurability or are already a Kafka Connect user.

Main features

Can be configured to support exactly-once semantics
Supports most common serialization formats (JSON, Avro, Protobuf)
Tested continuously against ClickHouse Cloud

Getting started

To get started using the ClickHouse Kafka Connect Sink, see the reference documentation.

Kafka table engine

The Kafka table engine can be used to read data from and write data to Apache Kafka and other Kafka API-compatible brokers. This option is bundled with open-source ClickHouse and is available across all deployment types.

Tip

This is the recommended option if you're self-hosting ClickHouse and need a low entry barrier option, or if you need to write data to Kafka.

Main features

Can be used for reading and writing data
Bundled with open-source ClickHouse
Supports most common serialization formats (JSON, Avro, Protobuf)

Getting started

To get started using the Kafka table engine, see the reference documentation.

Choosing an option

Product	Strengths	Weaknesses
ClickPipes for Kafka	• Scalable architecture for high throughput and low latency • Built-in monitoring and schema management • Private networking connections (via PrivateLink) • Supports SSL/TLS authentication and IAM authorization • Supports programmatic configuration (Terraform, API endpoints)	• Does not support pushing data to Kafka • At-least-once semantics
Kafka Connect Sink	• Exactly-once semantics • Allows granular control over data transformation, batching and error handling • Can be deployed in private networks • Allows real-time replication from databases not yet supported in ClickPipes via Debezium	• Does not support pushing data to Kafka • Operationally complex to set up and maintain • Requires Kafka and Kafka Connect expertise
Kafka table engine	• Supports pushing data to Kafka • Operationally simple to set up	• At-least-once semantics • Limited horizontal scaling for consumers. Cannot be scaled independently from the ClickHouse server • Limited error handling and debugging options • Requires Kafka expertise

Other options

Confluent Cloud - Confluent Platform provides an option to upload and run ClickHouse Connector Sink on Confluent Cloud or use HTTP Sink Connector for Confluent Platform that integrates Apache Kafka with an API via HTTP or HTTPS.
Vector - Vector is a vendor-agnostic data pipeline. With the ability to read from Kafka, and send events to ClickHouse, this represents a robust integration option.
JDBC Connect Sink - The Kafka Connect JDBC Sink connector allows you to export data from Kafka topics to any relational database with a JDBC driver.
Custom code - Custom code using Kafka and ClickHouse client libraries may be appropriate in cases where custom processing of events is required.

Available options​

ClickPipes for Kafka​

Main features​

Getting started​

Kafka Connect Sink​

Main features​

Getting started​

Kafka table engine​

Main features​

Getting started​

Choosing an option​

Other options​

Available options

ClickPipes for Kafka

Main features

Getting started

Kafka Connect Sink

Main features

Getting started

Kafka table engine

Main features

Getting started

Choosing an option

Other options