Data lakes have become a popular method of storing data and performing analytics. Amazon S3 offers a flexible, scalable way to store data of all types and sizes, and can be accessed and analyzed by a variety of tools.
Real-time data lake ingestion is the process of getting data into a data lake in near-real-time. Today, this can be accomplished by using streaming data platforms, message queues, and event-driven architectures, but these are very complex systems.
Turbine offers a code-first approach to building real-time data lake ingestion systems. This allows you to build, review, and test data products with a software engineering mindset. In this guide, you will learn how to use Turbine to ingest data into Amazon S3.
Here is what a Turbine Application looks like:
exports.App = class App {
async run(turbine) {
let source = await turbine.resources("pg");
let records = await source.records("customer_order");
let anonymized = await turbine.process(records, this.anonymize);
let destination = await turbine.resources("s3");
await destination.write(anonymized, "customer_order");
}
};
This application uses Javascript, but Turbine also has Go and Python SDKs.