Skip to main content

How to Create a PostgreSQL Delta Table in Google BigQuery

ยท 4 min read
warning

This is a legacy platform changelog. The information reflected here may not represent current functionality and some links may be broken.

In this guide, we will create a real-time data pipeline to create a delta table in BigQuery of changes from a PostgreSQL database using Meroxa. This pipeline will create, store, and update a table within BigQuery that contains a row for every insert, update, and delete operation from PostgreSQL.

PostgreSQL to BigQuery

How it works?โ€‹

For this example, we have a PostgreSQL table called public.Users. This table includes data such as name, email, etc.

PostgreSQL Example Table

The goal of the pipeline is to replicate and produce a table of the changes in BigQuery:

BigQuery Example Table

The schema of the created table will be defined as followed:

BigQuery Example Table Schema

As you perform INSERT, UPDATE, and DELETE operations to PostgreSQL, the table within BigQuery will be kept in sync in real-time.

Let's build.

Prerequisitesโ€‹

Before you begin, you need:

Adding a PostgreSQL Resourceโ€‹

To begin, we can add a PostgreSQL resource to the Meroxa Resource Catalog as the source of our data.

  1. Before you can add, the PostgreSQL instance needs to be accessible by Meroxa. You may:
  1. Use the meroxa resource create command to create a new resource:
$ meroxa resource create my-postgres \
--type postgres \
-u postgres://$PG_USER:$PG_PASS@$PG_URL:$PG_PORT/$PG_DB \
--metadata '{"logical_replication":"true"}'

In the command above, replace the following variables with a valid PostgreSQL Access URL.

Now, your new PostgreSQL resource is available in your Resource Catalog. You can view your resources with meroxa resource ls or in the Dashboard.

Adding a BigQuery Resourceโ€‹

Next, we can create a BigQuery resource.

  1. Before you can add, the BigQuery instance needs to be accessible by Meroxa. You'll need to have:

    • GCP Project ID - A Google Cloud Project ID.
    • BigQuery Dataset - A BigQuery dataset (see BigQuery Setup).
    • Service Account JSON - A valid Service Account credentials with access to BigQuery (see BigQuery Setup).
  2. Use the meroxa resource create command to create a new resource:

$ meroxa resource create my-bigquery \
--type bigquery \
--url bigquery://$GCP_PROJECT_ID/$GCP_DATASET \
--client-key "{ $GCP_SERVICE_ACCOUNT_JSON }"

In the command above, replace the following variables with valid credentials from your BigQuery environment.

Now, BigQuery is available in your Resource Catalog.

Connecting Resourcesโ€‹

Now that we have a source, Postgres, and a destination, BigQuery, in our catalog, we can connect them together.

  1. Use the Dashboard or CLI to create a new pipeline:
meroxa pipelines create postgres-to-bigquery
  1. Add a source to the pipeline with meroxa connector create:
meroxa connector create from-postgres-connector \
--from my-postgres \
--input $POSTGRES_TABLE_NAME # ex. public.Users \
--pipeline postgres-to-bigquery
  1. Capture the output stream name:

Streams are created when you create connectors. Source connectors produce output streams and destination connectors produce input streams.

To see the available streams for your connectors, you can use meroxa resources list:

$ meroxa resources list
ID NAME TYPE STREAMS STATE PIPELINE
===== ===================== ====================== ================================= ========= ==========
925 postgres-571-432074 debezium-pg-source output: running default
resource-571-811818.public.User

926 webhook-239-324695 http-destination input: running default
resource-571-811818.public.User

933 to-bg bigquery-destination input: running default
resource-571-811818.public.User
  1. Add a destination to the pipeline:

Now, we can use the meroxa resources create command to add a new destination connector:

meroxa connector create to-bigquery-connector \
--from my-biqquery \
--input $STREAM_NAME \
--pipeline postgres-to-bigquery

Once your connectors are "Running", Meroxa will load all of the current table data into BigQuery, then keep it up to date.

To view, we can go to BigQuery in the GCP Console and query our new deltable database:

PostgreSQL to BigQuery

What's Next?โ€‹

You now have a running data pipeline! As you perform INSERT, UPDATE, and DELETE operations, BigQuery will be kept in sync in real-time.

A great next step is to browse other resources to extend your existing data app or create new ones.

As always: