Amazon S3
The Conduit Platform by default supports Amazon S3 as a source and a destination.
The Amazon S3 source can connect to and emit objects from a bucket.
Required Configurations
Name | Description | Required | Default |
---|---|---|---|
aws.accessKeyId | The AWS access key id. | Yes | |
aws.secretAccessKey | The AWS secret access key. | Yes | |
aws.bucket | The AWS S3 bucket name. | Yes | |
aws.region | The AWS S3 bucket region. | Yes |
Looking for something else? See advanced configurations.
Initial Snapshot
Snapshot mode is enabled by default. When the source connector starts, it calls Configure
to parse the configuration. Next, Open
is called to start the connection using position. It will loop through all objects in the bucket and return them.
The position used for this mode will resemble thisIsAKey_s12345
, which is comprised of the object key, an underscore, an "s" for snapshot, and the maxLastModifiedDate
so far.
Note: If the provided bucket does not exist or if the source connector fails to access it, an error will occur.
The source connector uses this position to determine its mode and the last read object. The maxLastModifiedDate
is crucial when switching to CDC mode, as the CDC iterator captures changes occurring after that point.
Known Limitation: If a pipeline restarts during the snapshot, the source connector will start scanning objects from the beginning of the bucket, potentially leading to duplications.
Updates
The source connector utilizes Change Data Capture (CDC) to identify changes in Amazon S3 by scanning the bucket at every pollingPeriod
. Any UPDATE
, DELETE
and CREATE
changes that have occured after a specific timestamp are placed into a buffer, which is checked on each Read
request.
To capture changes like UPDATE
and DELETE
in Amazon S3, enabling bucket versioning is required. However, for CREATE
changes, bucket versioning is not required.
The position used for this mode will resemble thisIsAKey_c54321, which is comprised of the object key, an underscore, a "c" for CDC, and the maxLastModifiedDate
so far.
The source connector uses this position to return only changes with a lastModifiedDate
higher than the last record returned, ensuring that there are no duplications.
Key Handling
The source connector automatically looks up the S3 object key, which uniquely identifies each object in the bucket. This is why a record key is read from the S3 bucket.
Advanced Configurations
Name | Description | Required | Default |
---|---|---|---|
pollingPeriod | The polling period for CDC mode. Formatted as time.Duration string. | No | 1s |
prefix | The key prefix for the Amazon S3 source. | No |