Going to re:Invent this December? Come party with us and The Chainsmokers ->->

Blog / Engineering

Real-time Salesforce analytics with ClickHouse and Estuary Flow

author avatar
Estuary
Sep 25, 2024

estuary_clickhouse.png

The ability to process and analyze data in real-time has become a critical requirement for businesses across various industries. ClickHouse, a high-performance columnar database management system, combined with Estuary Flow, a robust data integration platform makes up a real-time analytics platform that can ingest and transform data from hundreds of sources.

This article explores how integrating these technologies, facilitated by Estuary Flow's Dekaf, works in detail. We’ll also include a hands-on example that showcases how you can build a real-time analytics pipeline over Salesforce data.

The Power of ClickHouse

ClickHouse has gained popularity for its exceptional speed in processing analytical queries on large datasets. Its columnar storage format and parallelized query execution make it ideal for real-time analytics workloads. Recently, ClickHouse has enhanced its capabilities with ClickPipes, an integration engine that simplifies data ingestion. This is principally designed for loading large datasets from infrastructure software such as Kafka, object storage, and Postgres. With a focus on high throughput, ClickPipes delegates the integration of more nuanced business sources that have their own APIs to either custom code or data integration platforms such as Estuary.

Estuary Flow: Bridging the Gap

While ClickHouse and ClickPipes provide the analytical horsepower and support for high throughput ingestion, Estuary Flow addresses a critical challenge in the data pipeline: seamless integration of hundreds of real-time data sources. Estuary Flow supports a wide array of sources, including CDC for databases like MongoDB, MySQL, Oracle, and SaaS platforms such as Netsuite and Salesforce for which ClickPipes is not a focus.

Historically, integrating these diverse data sources into ClickHouse required custom code or more complex configurations and integration with additional services such as Debezium. This complexity often created bottlenecks and increased the infrastructure complexity and potential for errors. However, Estuary Flow's introduction of Dekaf, a Kafka API compatibility layer, has dramatically simplified this process.

What is Dekaf?

Dekaf is Estuary Flow's innovative solution that allows consumers to read data from Estuary Flow collections as if they were Kafka topics. It also provides a schema registry API for managing schemas, enabling seamless integration with existing Kafka-based tools and workflows.

Here’s how it works:

  1. Connect to Your Data Sources with Estuary Flow: Utilize Estuary’s connectors to capture events in real time from your sources such as Salesforce.
  2. Leverage Dekaf as your Kafka-Compatible Endpoint: Dekaf acts as a built-in Kafka endpoint, allowing ClickHouse to read data directly from Estuary Flow collections.
  3. Ingest Data Directly into ClickHouse: ClickHouse connects to Estuary Flow using ClickPipes, ensuring smooth and real-time data ingestion.

This integration enables you to build robust, real-time analytics applications with a streamlined architecture. By combining Estuary Flow and ClickHouse, you can quickly scale your analytics capabilities to meet the demands of modern, data-driven applications.

Why is this good for ClickHouse users?

The integration of Dekaf with ClickHouse offers several significant advantages:

  1. Simplified Data Ingestion: With Dekaf, ClickHouse users can now directly ingest data from hundreds of real-time sources supported by Estuary Flow without the need for intermediate systems.
  2. Delivery Guarantees: Estuary Flow ensures exactly-once delivery semantics, maintaining the integrity of your real-time data streams.
  3. Scalability: The integration leverages the scalable architecture of both Estuary Flow and ClickHouse, ensuring high throughput and low latency even for demanding workloads.
  4. Reduced Complexity: By eliminating the need for additional services like Kafka and Debezium, the overall system becomes more streamlined and easier to manage.
  5. Broader Source Support: ClickHouse users now have access to Estuary Flow's extensive library of connectors, significantly expanding the range of data sources they can work with.

Implementing Real-time Salesforce Analytics with ClickHouse and Estuary Flow

Let’s take a look at the steps needed to set up this powerful integration.

Configure the Real-time Salesforce Connector

  1. Head over to your Estuary Flow dashboard and search for the Real-time Salesforce capture connector.

estuary_step_1.png

  1. Provide your Salesforce credentials and necessary permissions.

estuary_step_2.png

  1. Choose the Salesforce objects you want to stream (e.g., Accounts, Contacts, Opportunities).

estuary_step_3.png

  1. Use Dekaf to expose your Estuary Flow collections as Kafka-compatible topics.

Good news, there’s nothing to configure in this step – Dekaf automatically exposes the collections that are now being hydrated from Salesforce via a Kafka-API compatible API. The connection details are the following:

  • Broker Address: dekaf.estuary.dev
  • Schema Registry Address: https://dekaf.estuary.dev
  • Security Protocol: SASL_SSL
  • SASL Mechanism: PLAIN
  • SASL Username: {}
  • SASL Password: Estuary Refresh Token (Generate a refresh token in the dashboard)
  • Schema Registry Username: {}
  • Schema Registry Password: The same Estuary Refresh Token as above

Set up ClickPipes in ClickHouse.

To configure ClickHouse to ingest data from these collections using ClickPipes:

  1. Select Apache Kafka as a Data Source

estuary_step_5.png

  1. Configure access to Estuary Flow using the connection details from the previous step.

estuary_step_6.png

  1. Make sure the incoming data is parsed properly.

estaury_step_7.png

  1. Kick off the integration and wait until the ClickPipe is provisioned, this should only take a few seconds.

estuary_step_8.png

  1. Start analyzing your real-time data in ClickHouse! Let’s write a quick query to see if we can calculate the value of all closed deals for 2024.

estuary_step_9.png

Conclusion

The integration of ClickHouse and Estuary Flow, enabled by Dekaf, marks a significant leap forward in real-time analytics capabilities. By combining ClickHouse's analytical prowess with Estuary Flow's extensive source support and Dekaf's seamless integration, organizations can now implement robust, scalable, and efficient real-time analytics solutions with unprecedented ease.

As data continues to grow in volume and velocity, this integration provides a powerful toolkit for businesses to stay ahead in the data-driven world, turning real-time insights into actionable strategies.

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
Twitter imageSlack imageGitHub image
Telegram imageMeetup imageRss image