Skip to main content
Skip to main content
Edit this page

Migrate to Managed Postgres using logical replication

This guide provides step-by-step instructions on how to migrate your PostgreSQL database to ClickHouse Managed Postgres using Postgres native logical replication.

Private preview in ClickHouse Cloud

Prerequisites

  • Access to your source PostgreSQL database.
  • psql,pg_dump and pg_restore installed on your local machine. This is for creating empty tables in your target database. These are typically included with PostgreSQL installations. If not, you can download them from the PostgreSQL official website.
  • Your source database must be reachable from ClickHouse Managed Postgres. Ensure that any necessary firewall rules or security group settings allow for this connectivity. You can get the egress IP of your Managed Postgres instance by doing:
dig +short <your-managed-postgres-hostname>

The setup

For logical replication to work, we need to ensure that the source database is set up correctly. Here are the key requirements:

  • The source database must have wal_level set to logical.
  • The source database must have max_replication_slots set to at least 1.
  • For RDS (which this guide uses as an example), you need to ensure that your parameter group has rds.logical_replication set to 1.
  • The source database user must have the REPLICATION privilege. In the case of RDS, you would run:
    GRANT rds_replication TO <your-username>;
    

Make sure your source database is set up like this:

Schema-only dump of the source database

Before setting up logical replication, we need to create the schema in the target ClickHouse Managed Postgres database. We can do this by creating a schema-only dump of the source database using pg_dump:

pg_dump \
    -d 'postgresql://<user>:<password>@<host>:<port>/<database>' \
    -s \
    --format directory \
    -f rds-dump

Here:

  • Replace <user>, <password>, <host>, <port>, and <database> with your source database credentials.
  • -s specifies that we want a schema-only dump.
  • --format directory specifies that we want the dump in a directory format, which is suitable for pg_restore.
  • -f rds-dump specifies the output directory for the dump files. Note that this directory will be created automatically and should not exist beforehand.

In our case, we have two tables - events and users. events has a million rows, and users has a thousand rows.

Create a Managed Postgres instance

First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide here. Here's what we are going to spin up for this guide:

Restore the schema to ClickHouse Managed Postgres

Now that we have the schema dump, we can restore it to our ClickHouse Managed Postgres instance using pg_restore:

pg_restore \
    -d 'postgresql://<user>:<password>@<host>:<port>/<database>' \
    --verbose \
    rds-dump

Here:

  • Replace <user>, <password>, <host>, <port>, and <database> with your target ClickHouse Managed Postgres database credentials.
  • --verbose provides detailed output during the restore process. This command will create all the tables, indexes, views, and other schema objects in the target database without any data.

In our case, after running this command, we have our two tables and they're empty:

Set up logical replication

With the schema in place, we can now set up logical replication from the source database to the target ClickHouse Managed Postgres database. This involves creating a publication on the source database and a subscription on the target database.

Create a publication on the source database

Connect to your source PostgreSQL database and create a publication that includes the tables you want to replicate.

CREATE PUBLICATION <pub_name> FOR TABLE table1, table2...;
References

Creating a publication FOR ALL TABLES can incur network overhead if there are many tables. It's recommended to specify only the tables you want to replicate.

Create a subscription on the target ClickHouse Managed Postgres database

Next, connect to your target ClickHouse Managed Postgres database and create a subscription that connects to the publication on the source database.

CREATE SUBSCRIPTION demo_rds_subscription
CONNECTION 'postgresql://<user>:<password>@<host>:<port>/<database>'
PUBLICATION <pub_name_you_entered_above>;

This will automatically create a replication slot on the source database and start replicating data from the specified tables to the target database. Depending on the size of your data, this process may take some time.

In our case, after setting up the subscription, the data flowed in:

New rows inserted into the source database will now be replicated to the target ClickHouse Managed Postgres database in near real-time.

Caveats and considerations

  • Logical replication only replicates data changes (INSERT, UPDATE, DELETE). Schema changes (like ALTER TABLE) need to be handled separately.
  • Ensure that the network connection between the source and target databases is stable to avoid replication interruptions.
  • Monitor the replication lag to ensure that the target database is keeping up with the source database. Setting a suitable value for max_slot_wal_keep_size on the source database can help manage a growing replication slot and prevent it from consuming too much disk space.
  • Depending on your use case, you might want to set up monitoring and alerting for the replication process.

Next steps

Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going: