Amazon S3
Amazon S3 is a flexible object storage product offered by Amazon Web Services. It can be used as an upstream or downstream resource in your Turbine streaming apps by using the write
function to a select S3 bucket.
Setup
Resource Configuration
Use the meroxa resource create
command to configure your Amazon S3 resource.
The following example depicts how this command is used to create an Amazon S3 resource named datalake
with the minimum configuration required.
$ meroxa resource create datalake \
--type s3 \
--url "s3://$AWS_ACCESS_KEY:$AWS_ACCESS_SECRET@$AWS_REGION/$AWS_S3_BUCKET"
In the command above, replace the following variables with valid credentials from your Amazon S3 environment:
$AWS_ACCESS_KEY
- AWS Access Key$AWS_ACCESS_SECRET
- AWS Access Secret$AWS_REGION
- AWS Region (e.g.,us-east-2
)$AWS_S3_BUCKET
- AWS S3 Bucket Name
Configuration Options
To know more about what you can do with this resource, make sure you check out its connector configuration options.
Using with Turbine, it is as simple as (using TypeScript as an example):
let destination = await turbine.resources("s3");
await destination.write(anonymized, `my_directory_in_s3`, {
"file.name.template": "{{topic}}-{{partition}}-{{start_offset}}-{{timestamp:unit=yyyy}}{{timestamp:unit=MM}}{{timestamp:unit=dd}}{{timestamp:unit=HH}}.gz"
});
In the code snippet shared above, we only change file.name.template
however you can specify these other options.
Permissions
The following AWS Access Policy is required to be attached to the IAM user of the AWS_ACCESS_KEY
provided in the Connection URL:
{
"Statement": [
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
"s3:ListBucketMultipartUploads",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<bucket-name>/*",
"arn:aws:s3:::<bucket-name>"
]
}
],
"Version": "2012-10-17"
}
Data Record
Data records are written a folder within the root of the S3 bucket as gzipped JSON, with one record per file and using the following naming format:
<stream-name>-<partition-number>-<starting-offset>
In the following example, the record is from the resource-5-499379.public.orders
stream with starting offset 0000000000
and partition 0
.
aws s3 ls s3://data-lake-bucket/resource-7-133274/resource-5-499379.public.orders-0-0000000000.gz