Skip to main content
Skip to main content
Edit this page

Integrating Apache Spark with ClickHouse


Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

There are two main ways to connect Apache Spark and ClickHouse:

  1. Spark Connector - The Spark connector implements the DataSourceV2 and has its own Catalog management. As of today, this is the recommended way to integrate ClickHouse and Spark.
  2. Spark JDBC - Integrate Spark and ClickHouse using a JDBC data source.


Both solutions have been successfully tested and are fully compatible with various APIs, including Java, Scala, PySpark, and Spark SQL.

Try ClickHouse Cloud for FREE

Separation of storage and compute, automatic scaling, built-in SQL console, and lots more. $300 in free credits when signing up.

Try it for Free