Skip to main content

Amazon S3

The Conduit Platform by default supports Amazon S3 as a source and a destination.

The Amazon S3 source can connect to and emit objects from a bucket.

Required Configurations

NameDescriptionRequiredDefault
aws.accessKeyIdThe AWS access key id.Yes
aws.secretAccessKeyThe AWS secret access key.Yes
aws.bucketThe AWS S3 bucket name.Yes
aws.regionThe AWS S3 bucket region.Yes

Looking for something else? See advanced configurations.

Initial Snapshot

Snapshot mode is enabled by default. When the source connector starts, it calls Configure to parse the configuration. Next, Open is called to start the connection using position. It will loop through all objects in the bucket and return them.

The position used for this mode will resemble thisIsAKey_s12345, which is comprised of the object key, an underscore, an "s" for snapshot, and the maxLastModifiedDate so far.

Note: If the provided bucket does not exist or if the source connector fails to access it, an error will occur.

The source connector uses this position to determine its mode and the last read object. The maxLastModifiedDate is crucial when switching to CDC mode, as the CDC iterator captures changes occurring after that point.

Known Limitation: If a pipeline restarts during the snapshot, the source connector will start scanning objects from the beginning of the bucket, potentially leading to duplications.

Updates

The source connector utilizes Change Data Capture (CDC) to identify changes in Amazon S3 by scanning the bucket at every pollingPeriod. Any UPDATE, DELETE and CREATE changes that have occured after a specific timestamp are placed into a buffer, which is checked on each Read request.

To capture changes like UPDATE and DELETE in Amazon S3, enabling bucket versioning is required. However, for CREATE changes, bucket versioning is not required.

The position used for this mode will resemble thisIsAKey_c54321, which is comprised of the object key, an underscore, a "c" for CDC, and the maxLastModifiedDate so far.

The source connector uses this position to return only changes with a lastModifiedDate higher than the last record returned, ensuring that there are no duplications.

Key Handling

The source connector automatically looks up the S3 object key, which uniquely identifies each object in the bucket. This is why a record key is read from the S3 bucket.

Advanced Configurations

NameDescriptionRequiredDefault
pollingPeriodThe polling period for CDC mode. Formatted as time.Duration string.No1s
prefixThe key prefix for the Amazon S3 source.No