Skip to main content

Amazon Redshift

The Conduit Platform by default supports Amazon Redshift as a source and a destination.

The Amazon Redshift source can connect to and emit records from a table.

Required Configurations

NameDescriptionRequiredDefault
dsnData source name (DSN) to connect to Redshift. Example: redshift://username:password@redshift-cluster-endpoint:5439/databaseYes
tableThe table the source connector should read from.Yes
orderingColumnThe name of a column that the connector will use for ordering rows. The values must be unique and suitable for sorting, otherwise, the snapshot won't work correctly.Yes

Looking for something else? See advanced configurations.

Initial Snapshot

Snapshot mode is enabled by default. When the source connector starts, it captures the state of the Redshift table and its data at that point in time. It will retrieve the max value from orderingColumn and save that value to position.

The snapshot iterator will proceed to read, fetch, and order all rows where the value of the orderingColumn is less than or equal to the maximum value, in batches determined by the orderingColumn value.

Note: The default snapshot mode can be disabled by setting snapshot to false in the configuration.

Updates

The source connector utilizes Change Data Capture (CDC) to detect changes in a Redshift table using keyset pagination, while limiting batchSize and ordering by orderingColumn. Only rows added after initiating the source connector are moved in batches. Each INSERT, UPDATE, or DELETE operation executed on the table is captured by the CDC iterator, emitting records for each change.

Key Handling

The connector constructs sdk.Record.Key as sdk.StructuredData, incorporating elements from the keyColumns configuration field. If keyColumns is unspecified, the connector defaults to the primary keys of the specified table; if no primary keys exist, it resorts to the value of the orderingColumn field. The values for the sdk.Record.Key field are derived from sdk.Payload.After, matched with the keys of this field.

Table Name

For each record, the source connector appends a redshift.table property to the metadata, which holds the table name.

Advanced Configurations

NameDescriptionRequiredDefault
snapshotEnable or disable snapshot of entire table before starting CDC mode. Options: true or false.Notrue
keyColumnsComma-separated list of column names to build the sdk.Record.Key. Learn more: Key handling.No
batchSizeSize of rows batch. Min is 1 and max is 100000.No1000