Skip to main content

Migrating from Rockset

Rockset is a real-time analytics database that was acquired by OpenAI in June 2024. Users have until September 30th, 2024, 5 PM PDT to off-board from the service.

We think ClickHouse Cloud will provide an excellent home for Rockset users, and in this guide, we'll go through some things to consider when you migrate from Rockset to ClickHouse.

Let's get started!

Immediate assistance

If you need immediate assistance, please contact us by filling out this form and a human will get in touch with you!

ClickHouse vs Rockset - High-Level Comparison

We'll begin with a brief overview of its strengths and where you might see some benefits compared to Rockset.

ClickHouse focuses on real-time performance and cost efficiency through a schema-first approach. While semi-structured data is supported, our philosophy is that users should decide how to structure their data to maximize performance and resource efficiency. As a result of the schema-first approach described above, in our benchmarks, ClickHouse exceeds Rockset in scalability, ingestion throughput, query performance, and cost-efficiency.

Regarding integration with other data systems, ClickHouse has broad capabilities that exceed Rockset's.

Finally, unlike Rockset, ClickHouse has both open-source and cloud distribution. This migration guide focuses on migrating from Rockset to ClickHouse Cloud, but users can refer to the rest of our documentation on open-source capabilities.

Rockset Key Concepts

Let's start by going through the key concepts of Rockset and explain their equivalents (where they exist) in ClickHouse Cloud.

Data Sources

Rockset and ClickHouse both support loading data from a variety of sources.

In Rockset, you create a data source and then create a collection based on that data source. There are fully managed integrations for event streaming platforms, OLTP databases, and cloud bucket storage.

In ClickHouse Cloud, the equivalent of fully managed integrations is ClickPipes. ClickPipes supports continuously loading data from event streaming platforms and cloud bucket storage. ClickPipes loads data into tables.

Ingest Transformations

Rockset's ingest transformations let you transform the raw data coming into Rockset before it's stored in a collection. ClickHouse Cloud does the same via ClickPipes, which uses ClickHouse's materialized views feature to transform the data.

Collections

In Rockset, you query collections. In ClickHouse Cloud, you query tables. In both services, querying is done using SQL. ClickHouse adds extra functions on top of the ones in the SQL standard to give you more power to manipulate and transform your data.

Query Lambdas

Rockset supports query lambdas, named parameterized queries stored in Rockset that can be executed from a dedicated REST endpoint. ClickHouse Cloud's Query API Endpoints offer similar functionality.

Views

In Rockset, you can create views, virtual collections defined by SQL queries. ClickHouse Cloud supports several types of views:

  • Normal views do not store any data. They just perform a read from another table at query time.
  • Parameterized views are similar to normal views but can be created with parameters resolved at query time.
  • Materialized views store data transformed by the corresponding SELECT query. They are like a trigger that runs when new data is added to the source data to which they refer.

Aliases

Rockset aliases are used to associate multiple names with a collection. ClickHouse Cloud does not support an equivalent feature.

Workspaces

Rockset workspaces are containers that hold resources (i.e., collections, query lambdas, views, and aliases) and other workspaces.

In ClickHouse Cloud, you can use different services for full isolation. You can also create databases to simplify RBAC access to different tables/views.

Design Considerations

In this section, we will review some of the key features of Rockset and learn how to address them when using ClickHouse Cloud.

JSON support

Rockset supports an extended version of the JSON format that allows for Rockset-specific types.

There are multiple ways to work with JSON in ClickHouse:

  • JSON inference
  • JSON extract at query time
  • JSON extract at insert time

To understand the best approach for your user case, see our JSON documentation.

In addition, ClickHouse will soon have a Semistructured column data type. This new type should give users the flexibility Rockset's JSON type offers.

Rockset supports full-text search with its SEARCH function. While ClickHouse isn't a search engine, it does have various functions for searching in strings. ClickHouse also supports bloom filters, which can help in many scenarios.

Rockset has a similarity index, which can be used to index the embeddings used in vector search applications.

ClickHouse can also be used for vector search, using linear scans:

ClickHouse also has a vector search similarity index, but this approach is currently experimantal and is not yet compatible by the new query analyzer.

Ingesting data from OLTP databases

Rockset's managed integrations support ingesting data from OLTP databases like MongoDB and DynamoDB.

If you're ingesting data from DynamoDB, we suggest you turn on the option to export data into a Kinesis stream.

In the AWS Console for the Dynamo table, turn on Amazon Kinesis Data Streams:

Migrating Self-managed ClickHouse

Let's have a look at a message that contains all of the supported DynamoDB attributes. The attributes are named.

  • id (primary key)
  • number set
  • number
  • binary set
  • string
  • map
  • boolean
  • list

If eventName is MODIFY, there will be both a NewImage and OldImage key. If eventName is REMOVE, there will be only an OldImage key.

An example message is shown below:

{
"awsRegion": "us-east-1",
"eventID": "5a88419c-468a-4ac4-8bad-7f832caf7345",
"eventName": "INSERT",
"userIdentity": null,
"recordFormat": "application/json",
"tableName": "kelsey-rockset-testing",
"dynamodb": {
"ApproximateCreationDateTime": 1719347890250647,
"Keys": {
"id": {
"S": "some-partition-key"
}
},
"NewImage": {
"number set": {
"NS": [
"0",
"1"
]
},
"number": {
"N": "10"
},
"binary set": {
"BS": [
"MTMyMQ==",
"MTQzMQ=="
]
},
"string": {
"S": "some-string"
},
"null": {
"NULL": true
},
"map": {
"M": {
"a": {
"S": "some-string"
},
"b": {
"N": "0"
}
}
},
"boolean": {
"BOOL": false
},
"id": {
"S": "some-partition-key"
},
"string set": {
"SS": [
"some-string-1",
"some-string-2"
]
},
"binary": {
"B": "MTMyMQ=="
},
"list": {
"L": [
{
"S": "some-string-1"
},
{
"N": "13"
}
]
}
},
"SizeBytes": 198,
"ApproximateCreationDateTimePrecision": "MICROSECOND"
},
"eventSource": "aws:dynamodb"
}

Once you've configured that, you can set up a Kinesis ClickPipe for the stream configured for the DynamoDB table.

Record data will be ingested into the dynamodb column as a JSON string.

You must configure ClickPipes to extract the data into the desired format by specifying expressions in the default value column.

Migrating Self-managed ClickHouse

Existing users of ClickHouse Cloud have also successfully created CDC pipelines to data from other OLTP databases.

You can read more in a two-part blog series:

Compute-compute separation

Compute-compute separation is an architectural design pattern in real-time analytics systems that makes dealing with sudden bursts of incoming data or queries possible. Suppose a single component handles both ingestion and querying. In that case, we will see ingestion latency increase if there is a flood of queries, and query latency increases if there's a flood of data to ingest.

Compute-compute separation separates the data ingestion and query processing code paths to avoid this problem, and this is a feature that Rockset implemented in March 2023.

This feature is currently being implemented in ClickHouse Cloud and will be released in private preview in mid-July 2024.

Free migration services

We appreciate that this is a stressful time for Rockset users - no one wants to move a production database in such a short period!

If ClickHouse could be a good fit for you, we will provide free migration services to help smooth the transition.