Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In most cases involving Kafka and ClickHouse, users will wish to insert Kafka based data into ClickHouse - although the reverse is supported. Below we outline several options for both use cases, identifying the pros and cons of each approach.
For those who do not have a Kafka instance to hand, we recommend Confluent Cloud, which offers a free tier adequate for testing these examples. For self-managed alternatives, consider the Confluent for Kubernetes or here for non-Kubernetes environments.
- You are familiar with the Kafka fundamentals, such as producers, consumers and topics.
- You have a topic prepared for these examples. We assume all data is stored in Kafka as JSON, although the principles remain the same if using Avro.
- We utilise the excellent kcat (formerly kafkacat) in our examples to publish and consume Kafka data.
- Whilst we reference some python scripts for loading sample data, feel free to adapt the examples to your dataset.
- You are broadly familiar with ClickHouse materialized views.