Reverse ETL – Publishing Data from your Data Warehouse to Slack

A few of our most popular internal Slack channels contain almost no human-sent messages. Instead, they contain data. Actionable data. Messages we can check off and respond to. The data stems from our central data warehouse and it is compiled once a day to provide people that work inside Slack with key data.

This process is called “reverse ETL” or “publishing datasets”. At Meltano, reverse ETL is powered by Meltano, but you can achieve the same results with bare Python or most other data integration tools that work with APIs like the Slack API.

Let us quickly walk through why we love reverse ETL  and the general architecture overview on how to achieve this. Then we’re going step-by-step on how we set this up.

The business value of publishing datasets or reverse ETL.

Published datasets, or “reverse ETL ” means I as a practitioner don’t have to check a fancy dashboard somewhere. Instead I get the data right into the system I use to work every day.

For sales people, that might be a CRM system, but for me as Marketing/ DevRel that is inside Slack. 

For instance, once a day we get a message summarising the activity inside the Singer ecosystem on Github. That’s especially helpful for repositories we don’t own, and there are a lot of them as Singer is an open standard and has a vibrant ecosystem.

Using that message, we can coordinate inside slack, solve problems, and advertise new releases of taps and targets inside the wider ecosystem.

How we’re doing reverse ETL, a picture of the overall architecture.

Reverse ETL means taking data from your central data store and pushing it to various other targets. We’re taking data from our central snowflake database and push it into Slack and other services like e-mail.

To extract the data from snowflake, we’re using a Singer tap, to publish our data to slack and mail, we’re using the library apprise. Apprise is an amazing open-source notification library with over 7k stars on Github. It is the “one notification library to rule them all”. Take a look at the picture below to see the data flow.

Reverse ETL data flow

We could run & orchestrate this chain with most technologies like pure Python or Apache Airflow, but it’s much easier and much more fun with Meltano itself!

Meltano powered Reverse ETL data flow

Let’s go through the setup step-by-step.

Step 1: Model your data into a “publish” table.

The simplest and easiest way of getting data out of your warehouse into a target system is to have a “publish table” you create on schedule containing the notifications or messages with the data you want to send out.

Reverse ETL with notification table, overwritten daily

Don’t source already existing tables, create a new one dedicated to only this purpose.

You can take a look into our Slack alert table here, modeled with dbt, but you can use any tool you like. The following code is a part of our meltano/squared repository, inside the dbt model file slack_alerts.sql.

[ …]

SELECT
    'Pull Requests' AS title,
    '*Opened* :heavy_plus_sign::' || ARRAY_TO_STRING(
        prs_opened, ''
    ) || '\n\n\n*Merged* :pr-merged:' || ARRAY_TO_STRING(
        prs_merged, ''
    ) || '\n\n\n*Closed* :wastebasket:' || ARRAY_TO_STRING(
        prs_closed, ''
    ) AS body
FROM base

UNION ALL

SELECT
    'Issues' AS title,
    '*Opened* :heavy_plus_sign::' || ARRAY_TO_STRING(
        issues_opened, ''
    ) || '\n\n\n*Closed* :wastebasket:' || ARRAY_TO_STRING(
        issues_closed, ''
    ) AS body
FROM base

We’re keeping it simple and run this once a day. This table only contains the messages for this day with a title and a body. Every day, we overwrite the complete table.

Step 2: Have your taps & targets ready.

We’re using two components to move the data. 

  • tap-snowflake, to extract the data from our snowflake data warehouse.
  • target-apprise, a wrapper around apprise that then sends out the data to Slack and e.g. e-mail.

If you want to run this with Meltano, you’ll need a few configurations like the Slack auth key and the URI. We’re using a web hook here, but you can use multiple different ways of configuring Apprise for Slack. This code snippet is from our meltano/squared repository from our environment file .env.template.

# Slack API Key
TAP_SLACK_API_KEY="****"

[ …]

# Slack Web hook
TARGET_APPRISE_URIS=["https://hooks.slack.com/services/{}/{}/{}"]
```
Then you need to tap your published table.
```prod.meltano.yml
  config:
    plugins:
      extractors:
      - name: tap-snowflake-singer-activity
        config:
          tables: STAGING_PROD.SLACK_NOTIFICATIONS.SLACK_ALERTS

Because of some peculiarities between apprise and snowflake, we do need to lower case all column names. But thanks to Meltano that’s simple using a so-called “stream map”. This code snippet is located inside the Meltano project file prod.meltano.yml.

[ …]
    - name: target-apprise
        config:
          stream_maps:
            PROD-SLACK_NOTIFICATIONS-SLACK_ALERTS:
              TITLE: null
              title: TITLE
              BODY: null
              body: BODY

And that’s already everything you need to do! 

Put it on schedule and have people enjoy it.

Finally, we need to put our process on a daily schedule. 

Using Meltano we do this by adding a so-called “job” and a “schedule”, you can take a look at our configuration in orchestrators.meltano.yml, snippet below.

[ …]
schedules:
- name: slack_notifications
  interval: 0 12 * * *
  job: slack_notifications

…

jobs:

- name: slack_notifications
  tasks:
  - dbt-snowflake:run_slack_notifications
  - tap-snowflake-singer-activity target-apprise

And that is it.

Now feel free to try out Meltano for your reverse ETL processes.

Intrigued?

You haven’t seen nothing yet!