Skip to main content

Replicate data from PostgreSQL to Redshift

· 4 min read

In this guide, we are going to create a real-time data pipeline to sync data from PostgreSQL to Redshift using Meroxa. This pipeline will keep a table within Redshift up-to-date with every insert, update, and delete operation from PostgreSQL.

Postgres to Redshift

How it works?#

For this example, we have a PostgreSQL table called public.Users. This table includes data such as name, email, etc.

Postgres to Redshift

The goal of the pipeline is to replicate and keep the data in sync with a table in Redshift:

Postgres to Redshift

As you perform INSERT, UPDATE, and DELETE operations, Redshift will be kept in sync in real-time.

Let's build.

Prerequisites#

Before you begin, you need:

Adding a Postgres Resource#

To begin, we can add a Postgres resource to the Meroxa Resource Catalog as the source of our data.

  1. Before you can add, the PostgreSQL instance needs to be accessible by Meroxa. You may:
  1. Navigate to the Add Resource page.

Add Postgres Resource to Meroxa Catalog

  1. Add a valid Postgres Access URL.
postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB

In the Access URL, replace the following variables with valid credentials from your Postgres environment:

  • $PG_USER - Postgres Username
  • $PG_PASS - Postgres Password
  • $PG_URL - Postgres URL
  • $PG_DB - Postgres Database Name
  • $PG_PORT - Postgres Port (e.g., 5432).
  1. Enable Logical Replication

  2. Select "Add".

Now, your new Postgres resource is available in your Resource Catalog.

List Meroxa Resources

Adding a Redshift Resource#

Next, we can add a Redshift resource to the Resource Catalog.

  1. The Redshift instance needs to be accessible by Meroxa. You may only allow Meroxa IPs.

  2. Navigate to the Add Resource page.

Add Redshift Resource to Meroxa Catalog

  1. Add a valid Redshift Access URL.
redshift://$REDSHIFT_USER:$REDSHIFT_PASS@$REDSHIFT_URL:$REDSHIFT_PORT/REDSHIFT_DB

In the Access URL, replace the following variables with valid credentials from your environment:

  • $REDSHIFT_USER - Redshift Username
  • $REDSHIFT_PASS - Redshift Password
  • $REDSHIFT_URL - Redshift URL
  • $REDSHIFT_DB - Redshift Database Name
  • $REDSHIFT_PORT - Redshift Port (e.g., 5432).
  1. Select "Add".

Now, Redshift is available in your Resource Catalog.

Connecting Resources#

Now that we have a source, Postgres, and a destination, Redshift, in our catalog, we can connect them together.

Creating Postgres Source Connector#

  1. Navigate to Create Pipeline page.

Create pipeline

  1. Add a new "Source" to the pipeline:

Add Postgres Source to Pipeline

  1. Configure your new Postgres source connector:

Configure Postgres Connector

Here are the values to configure:

  • Connector Name: Give your connector a name.
  • Resource: Select your previously added resource.
  • Input: The "Input" for this connector is the name of the Postgres table you'd like to sync. For example, public.User.

Once you hit "Create," the new source will be added to your pipeline:

Add Postgres to Data Pipeline

If configured successfully, your connector will show as "Running". If not, you can view more information in the connector logs:

Postgres Connector Logs

Creating Redshift Destination Connector#

  1. Add a new "Destination" to the pipeline:
  2. Configure your new Postgres source connector:

Configure Redshift Connector

Here are the values to configure:

  • Connector Name: Give your connector a name.
  • Resource: Select your previously added resource.
  • Configuration: The destination connector will create a new table in Redshift. We can control the table name by setting the table.name.format config to users.

Connectors support different configuration options you can use to control connector behavior.

Once you hit "Create", the new destination will be added to your pipeline:

Full Pipeline

Once your connectors are "Running", Meroxa will load all of the current table data into Redshift, then keep it up to date.

To view, we can go to Redshift in the AWS Console and query our new users database:

SELECT * from users;

Postgres to Redshift

What's Next?#

You now have a running data pipeline! As you perform INSERT, UPDATE, and DELETE operations, Redshift will be kept in sync in real-time.

A great next step is to browse other sources and destinations to extend your existing pipeline or create new ones.

As always: