Skip to main content

Choose your own file template for a S3 connector with Turbine

· 2 min read
@_raulb_

Developers can now specify their own file templates for AWS S3 destinations within their Turbine apps. This change provides the flexibility to change the file names, directories and the compression type of the data being stored in S3.

When using a S3 connector as your destination, you might want to use a different file format than the provided by default by the S3 connector:

{{topic}}-{{partition}}-{{start_offset}}.gz

What this implies is that for every topic, a new file will be created in your S3 bucket named for example as resource-13886-387981-mytable-f95b0de8-0b08-4831-a30f-03118268f974-1-00000000000000000000.gz.

If you're looking for something more custom you could do things like:

{{topic}}-{{partition}}-{{start_offset}}-{{timestamp:unit=yyyy}}{{timestamp:unit=MM}}{{timestamp:unit
=dd}}{{timestamp:unit=HH}}.gz

Of even creating subdirectories including / in between:

{{topic}}/{{timestamp:unit=yyyy}}/{{timestamp:unit=MM}}/{{timestamp:unit
=dd}}/{{timestamp:unit=HH}}.gz

When modifying file.name.template or file.compression.type, you'll need to include the file extension. Otherwise, files will be created with the .gz extension considering its default compression algorithm is gzip. Here's how you'd do it using Turbine in TypeScript (we also support these other languages):

let destination = await turbine.resources("s3");

await destination.write(anonymized, `my_directory_in_s3`, {
"file.name.template": "{{topic}}-{{partition}}-{{start_offset}}-{{timestamp:unit=yyyy}}{{timestamp:unit=MM}}{{timestamp:unit=dd}}{{timestamp:unit=HH}}.gz"
});

For more information about the Meroxa S3 connector, inluding how to change the compression type, check out its documentation.

For further help, you can reach us directly at [email protected].

You can also find us in our Discord.