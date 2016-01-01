Integrating Amazon MSK with ClickHouse

We assume:

you are familiar with ClickHouse Connector Sink,Amazon MSK and MSK Connectors. We recommend the Amazon MSK Getting Started guide and MSK Connect guide.

The MSK broker is publicly accessible. See the Public Access section of the Developer Guide.

To connect to ClickHouse with HTTP(S) you need this information:

The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS.

The DATABASE NAME: out of the box, there is a database named default , use the name of the database that you want to connect to.

The USERNAME and PASSWORD: out of the box, the username is default . Use the username appropriate for your use case.

The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click Connect:

Choose HTTPS, and the details are available in an example curl command.

If you are using self-managed ClickHouse, the connection details are set by your ClickHouse administrator.

One way of increasing performance is to adjust the batch size and the number of records that are fetched from Kafka by adding the following to the worker configuration:

The specific values you use are going to vary, based on desired number of records and record size. For example, the default values are:

You can find more details (both implementation and other considerations) in the official Kafka and Amazon MSK documentation.

In order for MSK Connect to connect to ClickHouse, we recommend your MSK cluster to be in a private subnet with a Private NAT connected for internet access. Instructions on how to set this up are provided below. Note that public subnets are supported but not recommended due to the need to constantly assign an Elastic IP address to your ENI, AWS provides more details here