Skip to main content

Connect Apache NiFi to ClickHouse

Apache NiFi is an open-source workflow management software designed to automate data flow between software systems. It allows the creation of ETL data pipelines and is shipped with more than 300 data processors. This step-by-step tutorial shows how to connect Apache NiFi to ClickHouse as both a source and destination, and to load a sample dataset.

1. Gather your connection details

To connect to ClickHouse with HTTP(S) you need this information:

  • The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS.

  • The DATABASE NAME: out of the box, there is a database named default, use the name of the database that you want to connect to.

  • The USERNAME and PASSWORD: out of the box, the username is default. Use the username appropriate for your use case.

The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click Connect:

ClickHouse Cloud service connect button

Choose HTTPS, and the details are available in an example curl command.

ClickHouse Cloud HTTPS connection details

If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.

2. Download and run Apache NiFi

  1. For a new setup, download the binary from https://nifi.apache.org/download.html and start by running ./bin/nifi.sh start

3. Download the ClickHouse JDBC driver

  1. Visit the ClickHouse JDBC driver release page on GitHub and look for the latest JDBC release version
  2. In the release version, click on "Show all xx assets" and look for the JAR file containing the keyword "shaded" or "all", for example, clickhouse-jdbc-0.5.0-all.jar
  3. Place the JAR file in a folder accessible by Apache NiFi and take note of the absolute path

4. Add DBCPConnectionPool Controller Service and configure its properties

  1. To configure a Controller Service in Apache NiFi, visit the NiFi Flow Configuration page by clicking on the "gear" button

    Nifi Flow Configuration
  2. Select the Controller Services tab and add a new Controller Service by clicking on the + button at the top right

    Add Controller Service
  3. Search for DBCPConnectionPool and click on the "Add" button

    Search for DBCPConnectionPool
  4. The newly added DBCPConnectionPool will be in an Invalid state by default. Click on the "gear" button to start configuring

    Nifi Flow Configuration
  5. Under the "Properties" section, input the following values

    PropertyValueRemark
    Database Connection URLjdbc:ch:https://HOSTNAME:8443/default?ssl=trueReplace HOSTNAME in the connection URL accordingly
    Database Driver Class Namecom.clickhouse.jdbc.ClickHouseDriver
    Database Driver Location(s)/etc/nifi/nifi-X.XX.X/lib/clickhouse-jdbc-0.X.X-patchXX-shaded.jarAbsolute path to the ClickHouse JDBC driver JAR file
    Database UserdefaultClickHouse username
    PasswordpasswordClickHouse password
  6. In the Settings section, change the name of the Controller Service to "ClickHouse JDBC" for easy reference

    Nifi Flow Configuration
  7. Activate the DBCPConnectionPool Controller Service by clicking on the "lightning” button and then the "Enable" button

    Nifi Flow Configuration
    Nifi Flow Configuration
  8. Check the Controller Services tab and ensure that the Controller Service is enabled

    Nifi Flow Configuration

5. Read from a table using the ExecuteSQL processor

  1. Add an ​​ExecuteSQL processor, along with the appropriate upstream and downstream processors

    ​​ExecuteSQL processor
  2. Under the "Properties” section of the ​​ExecuteSQL processor, input the following values

    PropertyValueRemark
    Database Connection Pooling ServiceClickHouse JDBCSelect the Controller Service configured for ClickHouse
    SQL select querySELECT * FROM system.metricsInput your query here
  3. Start the ​​ExecuteSQL processor

    ​​ExecuteSQL processor
  4. To confirm that the query has been processed successfully, inspect one of the FlowFile in the output queue

    ​​ExecuteSQL processor
  5. Switch view to "formatted” to view the result of the output FlowFile

    ​​ExecuteSQL processor

6. Write to a table using MergeRecord and PutDatabaseRecord processor

  1. To write multiple rows in a single insert, we first need to merge multiple records into a single record. This can be done using the MergeRecord processor

  2. Under the "Properties” section of the MergeRecord processor, input the following values

    PropertyValueRemark
    Record ReaderJSONTreeReaderSelect the appropriate record reader
    Record WriterJSONReadSetWriterSelect the appropriate record writer
    Minimum Number of Records1000Change this to a higher number so that the minimum number of rows are merged to form a single record. Default to 1 row
    Maximum Number of Records10000Change this to a higher number than “Minimum Number of Records”. Default to 1,000 rows
  3. To confirm that multiple records are merged into one, examine the input and output of the MergeRecord processor. Note that the output is an array of multiple input records

    Input

    ​​ExecuteSQL processor

    Ouput

    ​​ExecuteSQL processor
  4. Under the "Properties" section of the PutDatabaseRecord processor, input the following values

    PropertyValueRemark
    Record ReaderJSONTreeReaderSelect the appropriate record reader
    Database TypeGenericLeave as default
    Statement TypeINSERT
    Database Connection Pooling ServiceClickHouse JDBCSelect the ClickHouse controller service
    Table NametblInput your table name here
    Translate Field NamesfalseSet to "false" so that field names inserted must match the column name
    Maximum Batch Size1000Maximum number of rows per insert. This value should not be lower than the value of “Minimum Number of Records” in MergeRecord processor
  5. To confirm that each insert contains multiple rows, check that the row count in the table is incrementing by at least the value of "Minimum Number of Records” defined in MergeRecord.

    ​​ExecuteSQL processor
  6. Congratulations - you have successfully loaded your data into ClickHouse using Apache Nifi!