This page received its last significant update on 2/20/2020

Since then, there have been some significant changes to our strategy, direction, and focus, so statements and recommendations may be outdated and not all examples may work.

The most up to date information can be found on the homepage, as well as any pages that don't show this warning.

If you encounter any inaccuracies, we welcome you to help us improve this page or submit an issue.

Transforms

Transforms in Meltano are implemented by using dbt. All Meltano generated projects have a transform/ directory, which is populated with the required configuration, models, packages, etc in order to run the transformations.

When Meltano elt runs with the --transform run option, the default dbt transformations for the extractor used are run; but Meltano will never modify the original source file.

As an example, assume that the following command runs:

meltano elt tap-gitlab target-postgres --transform run

After the Extract and Load steps are successfully completed and data have been extracted from the GitLab API and loaded to a Postgres DB, the dbt transform runs.

Meltano uses the convention that the transform has the same name as the extractor it is for. Transforms are automatically added the first time an elt operation that requires them runs, but they can also be discovered and added to a Meltano project manually:

(venv) $ meltano discover transforms

transforms
tap-gitlab

(venv) $ meltano add transform tap-gitlab
Transform tap-gitlab added to your meltano.yml config
Transform tap-gitlab added to your dbt packages
Transform tap-gitlab added to your dbt_project.yml

Transforms are basically dbt packages that reside in their own repositories. If you want to see in more details how such a package can be defined, you can check the dbt documentation on Package Management and dbt-tap-gitlab, the project used for defining the default transforms for tap-gitlab.

When a transform is added to a project, it is added as a dbt package in transform/packages.yml, enabled in transform/dbt_project.yml, and loaded for usage the next time dbt runs.

The format of the meltano.yml entries for transforms can have additional parameters. For example, the tap-gitlab dbt package requires three variables, which are used for finding the tables where the raw Carbon Intensity data have been loaded during the Extract-Load phase:

transforms:
- name: tap-gitlab
  pip_url: https://gitlab.com/meltano/dbt-tap-gitlab.git
  vars:
    entry_table: "{{ env_var('PG_SCHEMA') }}.entry"
    generationmix_table: "{{ env_var('PG_SCHEMA') }}.generationmix"
    region_table: "{{ env_var('PG_SCHEMA') }}.region"

Those entries may follow dbt's syntax in order to fetch values from environment variables. In this case, $PG_SCHEMA must be available in order for the transformations to know in which Postgres schema to find the tables with the Carbon Intensity data. Meltano uses $PG_SCHEMA by default as it is the same default schema also used by the Postgres Loader.

You can keep those parameters as they are and provide the schema as an environment variable or set the schema manually in meltano.yml:

transforms:
- name: tap-gitlab
  pip_url: https://gitlab.com/meltano/dbt-tap-gitlab.git
  vars:
    entry_table: "my_raw_schema.entry"
    generationmix_table: "my_raw_schema.generationmix"
    region_table: "my_raw_schema.region"

When Meltano runs a new transformation, transform/dbt_project.yml is always kept up to date with whatever is provided in meltano.yml.

Finally, dbt can be configured by updating transform/profile/profiles.yml. By default, Meltano sets up dbt to use the same database and user as the Postgres Loader and store the results of the transformations in the analytics schema.