Java Client… the SEQUEL?!

A fast, robust, and feature-complete database like ClickHouse requires an ecosystem of language clients that can align on the same properties to maximize its potential. As one of our most popular integrations, the ClickHouse Java client plays a crucial role by offering a seamless and efficient way to interact with the ClickHouse database from Java applications. Moreover, the client enhances developer productivity by providing a familiar and intuitive interface, reducing the complexity of database operations and fostering a more efficient development process.

For almost a decade now, the adoption of this package has been a testament to the quality of the contributions from the community members and ClickHouse team members, whose collaborative efforts have driven innovation and organically expanded its feature set. However, as with any sizable software package, there is always a moment to address key challenges to improve the overall developer experience and prepare for future extensions.

In this blog post, we’ll describe the refactoring effort of the ClickHouse Java project and what motivated it. In the meantime, you can already check out the early alpha release in our repository.

We heard the users and knew there were challenges around the overall developer experience in V1 - we heard you loud and clear and even experienced it ourselves!

Utilizing the client for simple operations currently requires excessive boilerplate and unintuitive code, as illustrated in this example. Numerous abstractions have been added over the years to enhance flexibility. However, we recognize that not every user requires or wishes to understand these abstractions to execute a simple query. The new API aims to offer intuitive and self-explanatory interfaces with reasonable defaults, enabling users to initiate their projects without the need to delve into the client source code.

Inserting or retrieving data using the Java client demands a thorough understanding of ClickHouse data formats. The Java client supports JSON and RowBinary data formats for insertion operations. A more compact format with less overhead, such as RowBinary, is preferred for applications with stringent latency requirements.

Unfortunately, the current implementation requires users to manage data serialization and deserialization of RowBinary within their applications. Although the client provides a set of helpers, the process of writing RowBinary and RowBinaryWithDefault is complex and prone to errors, particularly when dealing with intricate types such as nested arrays and maps. The new version of the client addresses this issue by providing serializers that are generated by parsing data objects. Users simply need to register these object classes, and our client will manage the rest.

Certain low-level optimizations, such as response object reuse, would introduce potentially dangerous side effects that can be unexpected in your application. Rather than enabling these “optimizations” by default, we’ve disabled them for now - all query results will be immutable and lazily deserialized to reduce the memory footprint.

We’re discussing having an opt-in approach in the future to change this, allowing users to activate them when there are high requirements for memory and CPU efficiency - be sure to speak up if you have thoughts on this!

We wanted to focus on solving these problems (and let people “just start using the client”) by:

This was a huge motivation because there was a LOT of complexity to the existing client - we had so many layers and so many abstractions, it was hard to know what methods to use where. We wanted to streamline the whole process to keep things flowing smoothly.

Documentation is the bane of many developers - it’s almost never quite what you need, and often becomes outdated as the underlying code moves on. We wanted to make sure the API was self-documenting, so that required parameters are obvious (without having to find a doc listing them) and optional parameters are made more visible too.

This also includes more implementation examples too - with a smaller API footprint, the new sample code should help folks onboard that much faster!

As I’m sure many of you can attest, the longer your codebase lives, the more historical baggage it collects. We’ve deprecated some client code over the last 8+ years, but it’s time to finally prune things (and deprecate new things in their place). We believe it will make contributing more comfortable in the long run.

Creating a client object is done in builder style. All settings are set by methods and then validated internally:

Client.Builder clientBuilder = new Client.Builder()
                .addEndpoint(endpoint)
                .setUsername(user)
                .setPassword(password)
                .compressServerResponse(true)
                .setDefaultDatabase(database);

this.client = clientBuilder.build();

Here’s how you might insert data with a sample POJO and the new serialization api (note you should have getters):

public class ArticleViewEvent {
    private Double postId;
    private LocalDateTime viewTime;
    private String clientId;
}

We need to register it before using insert methods:

client.register(ArticleViewEvent.class, client.getTableSchema(TABLE_NAME));

The second argument of the register is a table schema that can be an exact table one or a custom one.

Finally just pass collection of ArticleViewEvent objects to insert method:

ArrayList<ArticleViewEvent> events = … ; // filled collection
client.insert(TABLE_NAME, events, new InsertSettings());

The new client will select the most efficient format and will transmit data to a server.

See complete example

There are cases when data is already in a format supported by ClickHouse. For example, JSONEachRow. In this case nothing should be registered but data should be passed as InputStream:

public void insertData_JSONEachRowFormat(InputStream inputStream) {
   try {
     InsertResponse response = client.insert(TABLE_NAME, 
            inputStream,
            ClickHouseFormat.JSONEachRow)
        .get(3, TimeUnit.SECONDS);

     log.info("Insert finished: {} rows written",
                      response.getWrittenRows());



    } catch (Exception e) {
            log.error("Failed to write JSONEachRow data", e);
            throw new RuntimeException(e);
    }
 }

See complete example

Data from ClickHouse may be read as a collection of records, or it can be iterated through. Though we’ve implemented RowBinary and Native format readers for your convenience, raw data read is also possible. More format readers are also coming for well known formats like CSV and TSV.

Here is example of getting the first record from a result set:

GenericRecord hostnameRecord = client
    .queryAll("SELECT hostname()")
    .stream()
    .findFirst()
    .get();

Or just a collection:

   List<GenericRecord> records = client.queryAll(
        "SELECT col1, col2, col3 FROM " + DATASET_TABLE
   );


   for (GenericRecord record : records) {
        record.getString(3); // string column col3
   }

If more information from the response is needed, then com.clickhouse.client.api.Client#queryRecords would be handy. It returns com.clickhouse.client.api.query.Records that is an iterable object, plus it has an interface to the underlying server response.

    Records records = client.queryRecords(
        "SELECT col1, col2, col3 FROM " + DATASET_TABLE
    ).get(3, TimeUnit.SECONDS);
    System.out.println("Result rows: " + records.getResultRows());


    for (GenericRecord record : records) {
        record.getString(3); // string column col3
    }

We will be refactoring some of the underlying client code over the coming weeks/months and expanding out the v2 code, so we’d love it if you tried things out! Grab the 0.6.1+ version (which includes the early alpha release) of the Java client and start poking around client-v2 - there’s no better test than people actually using this in the wild.

Glad you asked! The best place would be to create an issue in our GitHub repository using the ‘v2-feedback’ label, but you can also find us on the community slack. And please do let us know if you have any questions/comments/concerns - we really want to hear from you!

We’ll continue to support the older (v1) client with security/bug fixes/minor enhancements until the end of 2025, but our focus going forward will unsurprisingly be the new client.

Blog / Product

Java Client… the SEQUEL?!

Issues We Saw

Excessive Complexity for Simple Operations.

Complexity in Data Insertion and Retrieval.

Unsafe Low-Level Optimizations

New Client API Goals

Intuitive API

Improve Documentation

Cleaning-Up Code Base

Let’s Try It Already!

Setup

Insert POJOs

Insert Data From A Stream

Read Data

Next Steps

Where can I ask questions/report issues?

What about the legacy client?

Subscribe to our newsletter

Recent posts

Products

Resources

Company

Join our community

Comparisons

Partners