Skip to main content

Real-time Data Lake Ingestion with Turbine

· 4 min read
@anaptfox
Developer Advocate

Data lakes have become a popular method of storing data and performing analytics. Amazon S3 offers a flexible, scalable way to store data of all types and sizes, and can be accessed and analyzed by a variety of tools.

Real-time data lake ingestion is the process of getting data into a data lake in near-real-time. Today, this can be accomplished by using streaming data platforms, message queues, and event-driven architectures, but these are very complex systems.

Turbine offers a code-first approach to building real-time data lake ingestion systems. This allows you to build, review, and test data products with a software engineering mindset. In this guide, you will learn how to use Turbine to ingest data into Amazon S3.

Here is what a Turbine Application looks like:

exports.App = class App {
async run(turbine) {
let source = await turbine.resources("pg");

let records = await source.records("customer_order");

let anonymized = await turbine.process(records, this.anonymize);

let destination = await turbine.resources("s3");

await destination.write(anonymized, "customer_order");
}
};

This application uses Javascript, but Turbine also has Go and Python SDKs.

Real-time eCommerce Order Data Warehousing and Alerting with Turbine

· 4 min read
@anaptfox
Developer Advocate

Data warehouses like Snowflake allow you to collect and store data from multiple sources so that it can be accessed and analyzed. Real-time data warehousing is essential for e-commerce because it allows for up-to-the-minute analysis of customer behavior. In addition, the same data could be used to generate alerts about successful orders or potential fraud.

An approach often used to solve this problem is to use two entirely different tools: one tool to ingest into a data warehouse and another to make use of reverse ETL to perform alerting from data that's being activated within the data warehouse itself. However, this is difficult to maintain and can be a costly process.

Instead, you can use just Turbine to perform both real-time warehousing and alerting to Slack.

Here is what a Turbine Application looks like:

exports.App = class App {
async run(turbine) {
let source = await turbine.resources('pg')

let records = await source.records('customerOrders')

let data = await turbine.process(records, this.sendAlert)

let destination = await turbine.resources('snowflake')

await destination.write(data, 'customerOrders')
}
}

This application uses Javascript but Turbine also has Go and Python SDKs.

Real-time Search Indexing with Turbine and Algolia

· 5 min read
@anaptfox
Developer Advocate

Developers often consider using operational databases (e.g. PostgreSQL, MySQL) to perform search. However, search engines like Algolia are more efficient for the searching problem because they provide low-latency search querying/filtering and search-specific features such as ranking, typo tolerance, and more.

Once you have decided on a search engine, your next step is to inevitably answer: How do you send and continuously sync data to Algolia?

This is where Turbine comes in. With Turbine, you can properly test, review, and build data integrations in a code-first way. Then, you can easily deploy your data application to Meroxa. No more fragile deployments, no more manual testing, no more suprise maintenance, just code.

Here is what a Turbine Application looks like:

const { updateIndex } = require('./algolia.js');

exports.App = class App {
sendToAlgolia(records) {
records.forEach(record => {
updateIndex(record);
});
return records;
}

async run(turbine) {

let source = await turbine.resources('postgresql');

let records = await source.records("User");

await turbine.process(records, this.sendToAlgolia, {
ALGOLIA_APP_ID: process.env.ALGOLIA_APP_ID,
ALGOLIA_API_KEY: process.env.ALGOLIA_API_KEY,
ALGOLIA_INDEX: process.env.ALGOLIA_INDEX,
});

}
};

In this article, we are going to create a data application to ingest and sync data from PostgreSQL to Algolia.

How to Obtain a Meroxa Access Token

· One min read
@anaptfox
Developer Advocate

The Meroxa access token is needed to authenticate to the Meroxa API programmatically. For example, the token allows you to build pipelines with Terraform.

To obtain a token, you must install the Meroxa CLI. Then, follow these steps:

  1. Log in to the CLI.
$ meroxa login
  1. Get token.

The meroxa config command allows you access details about your Meroxa environment.

Meroxa Config Command

For security, the output is obfuscated unless you use the --json command:

$ meroxa config --json

Other Methods

If you're familiar with jq, in one command, you can parse the JSON output and only print the Meroxa token:

$ meroxa config --json | jq -r .config.access_token

You could also add this to your .zshrc or .profile to always have it available in your environment.

export MEROXA_REFRESH_TOKEN=$(meroxa config --json | jq -r .config.access_token)

How to Expose PostgreSQL Remotely Using ngrok

· 2 min read

In this guide, we will walk through exposing a local PostgreSQL instance with ngrok. This method allows you to quickly test and analyze the behavior of PostgreSQL with data platforms like Meroxa.

Add Local PG

For this example, we are going to use ngrok. ngrok exposes local servers behind NATs and firewalls to the public internet over secure tunnels.