Skip to main content

How to Create a PostgreSQL Delta Table in Google BigQuery

· 4 min read

In this guide, we will create a real-time data pipeline to create a delta table in BigQuery of changes from a PostgreSQL database using Meroxa. This pipeline will create, store, and update a table within BigQuery that contains a row for every insert, update, and delete operation from PostgreSQL.

Postgres to BigQuery

How it works?#

For this example, we have a PostgreSQL table called public.Users. This table includes data such as name, email, etc.

Postgres Example Table

The goal of the pipeline is to replicate and produce a table of the changes in BigQuery:

BigQuery Example Table

The schema of the created table will be defined as followed:

BigQuery Example Table Schema

As you perform INSERT, UPDATE, and DELETE operations to PostgreSQL, the table within BigQuery will be kept in sync in real-time.

Let's build.

Prerequisites#

Before you begin, you need:

Adding a Postgres Resource#

To begin, we can add a Postgres resource to the Meroxa Resource Catalog as the source of our data.

  1. Before you can add, the PostgreSQL instance needs to be accessible by Meroxa. You may:
  1. Use the meroxa resource create command to create a new resource:
meroxa resource create my-postgres \--type postgres \-u postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \--metadata '{"logical_replication":"true"}'

In the command above, replace the following variables with a valid Postgres Access URL.

Now, your new Postgres resource is available in your Resource Catalog. You can view your resources with meroxa resource ls or in the Dashboard.

Adding a BigQuery Resource#

Next, we can add a BigQuery resource to the Resource Catalog.

  1. Before you can add, the BigQuery instance needs to be accessible by Meroxa. You'll need to have:

    • GCP Project ID - A Google Cloud Project ID.
    • BigQuery Dataset - A BigQuery dataset (see BigQuery Setup).
    • Service Account JSON - A valid Service Account credentials with access to BigQuery (see BigQuery Setup).
  2. Use the meroxa resource create command to create a new resource:

meroxa resource add my-bigquery \--type bigquery \--url bigquery://$GCP_PROJECT_ID/$GCP_DATASET \--client-key "{ $GCP_SERVICE_ACCOUNT_JSON }"

In the command above, replace the following variables with valid credentials from your BigQuery environment.

Now, BigQuery is available in your Resource Catalog.

Connecting Resources#

Now that we have a source, Postgres, and a destination, BigQuery, in our catalog, we can connect them together.

  1. Use the Dashboard or CLI to create a new pipeline:
meroxa pipelines create postgres-to-bigquery
  1. Add a source to the pipeline with meroxa connector create:
meroxa connector create from-postgres-connector \--from my-postgres \--input $POSTGRES_TABLE_NAME # ex. public.Users \--pipeline postgres-to-bigquery
  1. Capture the output stream name:

Streams are created when you create connectors. Source connectors produce output streams and destination connectors produce input streams.

To see the available streams for your connectors, you can use meroxa connectors list:

$ meroxa connectors list ID           NAME                   TYPE                       STREAMS                STATE    PIPELINE===== ===================== ====================== ================================= ========= ========== 925   postgres-571-432074   debezium-pg-source     output:                           running   default                                                    resource-571-811818.public.User
 926   webhook-239-324695    http-destination       input:                            running   default                                                    resource-571-811818.public.User
 933   to-bg                 bigquery-destination   input:                            running   default                                                    resource-571-811818.public.User
  1. Add a destination to the pipeline:

Now, we can use the meroxa connector create command to add a new destination connector:

meroxa connector create to-bigquery-connector \--from my-biqquery \--input $STREAM_NAME \--pipeline postgres-to-bigquery

Once your connectors are "Running", Meroxa will load all of the current table data into BigQuery, then keep it up to date.

To view, we can go to BigQuery in the GCP Console and query our new deltable database:

Postgres to BigQuery

What's Next?#

You now have a running data pipeline! As you perform INSERT, UPDATE, and DELETE operations, BigQuery will be kept in sync in real-time.

A great next step is to browse other sources and destinations to extend your existing pipeline or create new ones.

As always: