ClickHouse CTO, Alexey Milovidov, is coming to a city near you! ->->

Blog / Engineering

Manage your ClickHouse Schema-as-Code using Atlas

author avatar
Rotem Tamir
Mar 12, 2024

Today, we welcome Rotem Tamir from ariga, who maintains the open-core tool atlas for managing database schemas as code. Rotem dives into the details and shows the value of the recent support for ClickHouse.

The Rise and Fall of Schema-less Technologies

In the early 2010s, dynamically typed languages like Python and Javascript, and NoSQL databases like MongoDB and Elasticsearch marked the trend of departing from rigid, upfront schema planning. All of these technologies, with less verbose syntax and greater flexibility, promised quicker development cycles and easier prototyping, making them the languages of choice for startups and new projects aiming for rapid market entry.

However, as projects and organizations grew larger in size and complexity, the initial advantages of these technologies began to reveal significant trade-offs. The lack of schema enforcement led to inconsistencies in data, making it harder to ensure data quality and integrity as applications evolved. The absence of strict type systems and the flexible nature of schema-less databases made debugging and maintaining large codebases increasingly difficult.

The increasing interest in relational storage technology such as ClickHouse signifies a nuanced shift in our industry’s approach to data management. While modern data systems need to process diverse datasets that sometimes require an unstructured approach, there is a growing appreciation for the efficiency and organization that comes with strongly typed, structured data processing methods.

Managing Schemas is still a Pain

One of the drivers of the NoSQL movement was that many developers wanted to avoid the pains of managing database schemas at all costs. Schema changes, also called migrations, require technical expertise to be done safely and efficiently. Modern databases provide a limited, imperative, way of changing schemas called DDL statements.

What if instead of giving up on schemas and their benefits entirely, we could make managing them a seamless process that does not require any special effort? Administering database schemas undoubtedly requires technical expertise, but it's not an infinitely complex problem. Thus, automated tools can be created to simplify the task for non-experts.

Database Schema-as-Code

atlas_declarative_vs_imperative.png

In recent years, in the wake of huge advancements made in cloud resource management with an approach called Infrastructure-as-Code, a new approach for managing database schemas is emerging. This approach, coined "Database Schema-as-Code" and embodied by projects such as Atlas and Skeema, aims to greatly simplify schema management by arming engineers with a tool that offers a _declarative _API for interacting with databases.

A declarative API works by providing the tool with the desired state of the system and letting the tool figure out the operations needed to reconcile it with the current state. In the context of database schema management, users provide the tool with the schema they would like their database to have, let the tool inspect the current database information schema, and generate a migration plan for them automatically.

Automatic migration planning isn’t a new concept, of course. Application development frameworks and ORMs such as Django, have offered similar capabilities for many years. The issue with automating migration planning at the ORM level is that they are only usable in their specific context and cannot be applied generically for every stack. In addition, ORMs tend to focus on a small subset of database features such as tables, columns, foreign keys, and indexes and do not provide a way to manage more advanced objects such as views, materialized views, stored procedures, etc.

How it works

Let’s see how Database Schema-as-Code and declarative migrations work in practice. To demonstrate it we will use Atlas, an open-core database schema-as-code tool (F.D: I am one of the creators). For getting started and installation instructions, visit the Atlas docs.

As a prerequisite to this demo, let’s run a local, empty ClickHouse instance using Docker:

docker run --rm -d --name atlas-demo -e CLICKHOUSE_DB=demo -p 9000:9000 clickhouse/clickhouse-server:latest

Contrary to more traditional approaches, with declarative migrations, we always start with the desired state of the database. Let’s create a file named schema.sql with the following contents:

CREATE TABLE `users` (
   `id` UInt64,
   `name` String)
ENGINE = MergeTree
PRIMARY KEY (`id`)
ORDER BY `id`;

Now, we can apply this schema to the target database using the atlas schema apply command:

atlas schema apply \
-u "clickhouse://localhost:9000/demo" \
--to file://schema.sql \
--dev-url "docker://clickhouse/23.11/demo"

This will tell Atlas that we want to apply the schema defined in the file named schema.sql to the ClickHouse database at localhost:9000/demo (which we just used Docker to spin up locally).

Atlas connects to the target database, inspects its information schema, calculates the diff between the current and desired state, and prompts us to approve the migration plan:

-- Planned Changes:
-- Create "users" table
CREATE TABLE `users` (
  `id` UInt64,
  `name` String
) ENGINE = MergeTree
 PRIMARY KEY (`id`) SETTINGS index_granularity = 8192;
? Are you sure?:
  ▸ Apply
    Lint and edit
    Abort

The plan looks good, and so we hit "Apply" to apply the changes to the database.

After successfully running the migration Atlas planned for us, we can re-run the schema apply command to ensure that the database is at its desired state:

atlas schema apply \
-u "clickhouse://localhost:9000/demo" \
--to file://schema.sql \
--dev-url "docker://clickhouse/23.11/demo"

Once again, Atlas connects to our ClickHouse instance, inspects it, and calculates the diff, this time telling us:

Schema is synced, no changes to be made

Next, let’s make a small change to our desired state to show that declarative migrations work on a live database with an existing schema. Let’s add a new column to our users table, email_address. Edit schema.sql adding the following column:

CREATE TABLE `users` (
    `id` UInt64,
    `name` String,
+    `email_address` String
)
 ENGINE = MergeTree
 PRIMARY KEY (`id`)
ORDER BY `id`;

Next, let’s re-run the schema applycommand and see how Atlas plans the migration for us:

-- Planned Changes:
ALTER TABLE `users` ADD COLUMN `email_address` String;
? Are you sure?:
  ▸ Apply
    Lint and edit
    Abort

Great! Atlas detected the existing table and planned migration to add the missing column to the target database. Let’s hit Apply to bring our database to the desired state.

Wrapping Up

The 2010s trended towards schema-less databases as developers sought to avoid the complexities of schema management, but this shift led to challenges in data consistency and maintenance as systems scaled. Learning from the challenges of the past, the tech industry is now showing a renewed interest in relational technologies.

However, developers still face the same challenges with schema management as they did in the past. Database Schema-as-Code tools such as Atlas emerge as a solution, offering to simplify and streamline schema operations, bridging the gap between the desire for development efficiency and the need for structured data integrity.

In this article, we have shown a glimpse of the capabilities of Atlas which in what is by no means a comprehensive guide. A more complete "Getting Started with Atlas" for ClickHouse users is available on the Atlas documentation website.

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
Twitter imageSlack imageGitHub image
Telegram imageMeetup imageRss image