Migrate to Managed Postgres using logical replication
This guide provides step-by-step instructions on how to migrate your PostgreSQL database to ClickHouse Managed Postgres using Postgres native logical replication.
Prerequisites
- Access to your source PostgreSQL database.
psql,pg_dumpandpg_restoreinstalled on your local machine. This is for creating empty tables in your target database. These are typically included with PostgreSQL installations. If not, you can download them from the PostgreSQL official website.- Your source database must be reachable from ClickHouse Managed Postgres. Ensure that any necessary firewall rules or security group settings allow for this connectivity. You can get the egress IP of your Managed Postgres instance by doing:
The setup
For logical replication to work, we need to ensure that the source database is set up correctly. Here are the key requirements:
- The source database must have
wal_levelset tological. - The source database must have
max_replication_slotsset to at least1. - For RDS (which this guide uses as an example), you need to ensure that your parameter group has
rds.logical_replicationset to1. - The source database user must have the
REPLICATIONprivilege. In the case of RDS, you would run:
Make sure your source database is set up like this:
Schema-only dump of the source database
Before setting up logical replication, we need to create the schema in the target ClickHouse Managed Postgres database. We can do this by creating a schema-only dump of the source database using pg_dump:
Here:
- Replace
<user>,<password>,<host>,<port>, and<database>with your source database credentials. -sspecifies that we want a schema-only dump.--format directoryspecifies that we want the dump in a directory format, which is suitable forpg_restore.-f rds-dumpspecifies the output directory for the dump files. Note that this directory will be created automatically and should not exist beforehand.
In our case, we have two tables - events and users. events has a million rows, and users has a thousand rows.
Create a Managed Postgres instance
First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide here. Here's what we are going to spin up for this guide:
Restore the schema to ClickHouse Managed Postgres
Now that we have the schema dump, we can restore it to our ClickHouse Managed Postgres instance using pg_restore:
Here:
- Replace
<user>,<password>,<host>,<port>, and<database>with your target ClickHouse Managed Postgres database credentials. --verboseprovides detailed output during the restore process. This command will create all the tables, indexes, views, and other schema objects in the target database without any data.
In our case, after running this command, we have our two tables and they're empty:
Set up logical replication
With the schema in place, we can now set up logical replication from the source database to the target ClickHouse Managed Postgres database. This involves creating a publication on the source database and a subscription on the target database.
Create a publication on the source database
Connect to your source PostgreSQL database and create a publication that includes the tables you want to replicate.
Creating a publication FOR ALL TABLES can incur network overhead if there are many tables. It's recommended to specify only the tables you want to replicate.
Create a subscription on the target ClickHouse Managed Postgres database
Next, connect to your target ClickHouse Managed Postgres database and create a subscription that connects to the publication on the source database.
This will automatically create a replication slot on the source database and start replicating data from the specified tables to the target database. Depending on the size of your data, this process may take some time.
In our case, after setting up the subscription, the data flowed in:
New rows inserted into the source database will now be replicated to the target ClickHouse Managed Postgres database in near real-time.
Caveats and considerations
- Logical replication only replicates data changes (INSERT, UPDATE, DELETE). Schema changes (like ALTER TABLE) need to be handled separately.
- Ensure that the network connection between the source and target databases is stable to avoid replication interruptions.
- Monitor the replication lag to ensure that the target database is keeping up with the source database. Setting a suitable value for
max_slot_wal_keep_sizeon the source database can help manage a growing replication slot and prevent it from consuming too much disk space. - Depending on your use case, you might want to set up monitoring and alerting for the replication process.
Next steps
Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going: