# Meltano: open source ELT

Meltano is an open source platform for building, running & orchestrating ELT pipelines made up of Singer taps and targets and dbt models, that you can run locally or easily deploy in production.

Our goal is to make the power of data integration available to all by building a true open source alternative to existing proprietary hosted EL(T) solutions, in terms of ease of use, reliability, and quantity and quality of supported data sources.

Scroll down for details on Meltano projects, integration (EL), transformation (T), orchestration, containerization, and Meltano UI.

Get started Join us on Slack Contribute on GitLab

Experience it for yourself in just a few minutes!

# For these examples to work, ensure that:
# - you are running Linux or macOS
# - Python 3.6 or 3.7 (NOT 3.8) has been installed
python3 --version

# Create directory for Meltano projects
mkdir meltano-projects
cd meltano-projects

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install Meltano
pip3 install meltano

Meltano is now ready for its first project!

# Your Meltano project: a single source of truth

At the core of the Meltano experience is your Meltano project, which represents the single source of truth regarding your ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various components should be configured.

Since a Meltano project is just a directory on your filesystem containing text-based files, you can treat it like any other software development project and benefit from DevOps best practices such as version control, code review, and continuous integration and deployment (CI/CD).

You can initialize a new Meltano project using meltano init.

Learn more about Meltano projects

Follow the installation instructions above and then...

# Initialize a new Meltano project
meltano init demo-project

Your Meltano project has now been initialized in the demo-project directory!

# Before using a `meltano` command, ensure that:
# - you have navigated to your Meltano project
cd demo-project
# - you have activated the virtual environment
source ../.venv/bin/activate

Your Meltano project is now ready for integration, transformation, and orchestration!

# Integration just a few keystrokes away

You can use existing Singer taps and targets or easily write your own to extract data from any SaaS tool or database and load it into any data warehouse or file format.

Meltano manages your tap and target configuration for you, makes it easy to select which entities and attributes to extract, and keeps track of the state of your extraction, so that subsequent pipeline runs with the same job ID will always pick up right where the previous run left off.

Scroll down to learn more about transformation and orchestration.

Learn more about data integration using Singer

Follow the project initialization instructions above and then...

# Add GitLab extractor to your project
meltano add extractor tap-gitlab

# Configure tap-gitlab to extract data from...
# - the https://gitlab.com/meltano/meltano project
meltano config tap-gitlab set projects meltano/meltano
# - going back to May 1st, 2020
meltano config tap-gitlab set start_date 2020-05-01T00:00:00Z

# Add JSONL loader
meltano add loader target-jsonl

# Ensure target-jsonl output directory exists
mkdir -p output

# Run data integration pipeline
meltano elt tap-gitlab target-jsonl --job_id=gitlab-to-jsonl

# Read latest tag
head -n 1 output/tags.jsonl
{"name": "v1.54.0", "message": "", "target": "f07326ab905495d6916ca3f796a6b767833cb6dc", "commit_id": "f07326ab905495d6916ca3f796a6b767833cb6dc", "project_id": 7603319}

Your data has now been extracted and loaded!

# Transformation as a first-class citizen

Once your raw data has arrived in your data warehouse, its schema will likely need to be transformed to be more appropriate for analysis.

Meltano helps you out here as well, with built-in (but optional!) support for running dbt models as part of your pipeline.

When you add the dbt transformer to your project, a full-fledged dbt project will automatically be initialized in the transform directory. Any transform plugins added to your Meltano project will automatically be added to the dbt project as well, but you can easily install existing dbt models from packages or write your own.

Learn more about transformation using dbt

Follow the integration instructions above and then...

# For these examples to work, ensure that:
# - you have PostgreSQL running somewhere
# - you have created a new database
# - you change the configuration below as appropriate

# Add PostgreSQL loader
meltano add loader target-postgres

# Configure target-postgres through the environment
export PG_ADDRESS=localhost
export PG_PORT=5432
export PG_USERNAME=meltano
export PG_PASSWORD=meltano
export PG_DATABASE=demo-warehouse

# Add dbt transformer and initialize dbt project
meltano add transformer dbt

# Add PostgreSQL-compatible dbt models for tap-gitlab
meltano add transform tap-gitlab

# Run data integration and transformation pipeline
meltano elt tap-gitlab target-postgres --transform=run --job_id=gitlab-to-postgres

# Start `psql` shell connected to warehouse database
PGPASSWORD=$PG_PASSWORD psql -U $PG_USERNAME -h $PG_ADDRESS -p $PG_PORT -d $PG_DATABASE
-- Read latest tag
SELECT * FROM analytics.gitlab_tags LIMIT 1;
 project_id |                commit_id                 | tag_name |                  target                  | message
------------+------------------------------------------+----------+------------------------------------------+---------
    7603319 | f07326ab905495d6916ca3f796a6b767833cb6dc | v1.54.0  | f07326ab905495d6916ca3f796a6b767833cb6dc |
(1 row)

Your data has now been extracted, loaded, and transformed!

# Orchestration right out of the box

Once you've managed to successfully run your ELT pipeline once, you'll probably want to run it again, and again, and again.

Meltano lets you set up pipeline schedules that can then automatically be fed to and run by a supported orchestrator like Apache Airflow.

When you add the airflow orchestrator to your project, a Meltano DAG generator will automatically be added to the orchestrate/dags directory, where Airflow will look for DAGs by default. If the default behavior of simply running meltano elt on a schedule is not going to cut it, you can easily modify the DAG generator or add your own.

Learn more about orchestration using Airflow

Follow the transformation instructions above and then...

# Schedule pipelines
meltano schedule gitlab-to-jsonl tap-gitlab target-jsonl @hourly
meltano schedule gitlab-to-postgres tap-gitlab target-postgres @daily --transform=run

# List scheduled pipelines
meltano schedule list

# Add Airflow orchestrator and default DAG generator
meltano add orchestrator airflow

# Start the Airflow scheduler (add `-D` to background)
meltano invoke airflow scheduler

Your pipelines will now run on a schedule!

# Start the Airflow web interface (add `-D` to background)
meltano invoke airflow webserver

Airflow is now available at http://localhost:8080!

Airflow webserver

# Instantly containerizable and production-ready

Now that you've got your pipelines running locally, it'll be time to repeat this trick in production!

Since your Meltano project is your single source of truth, deploying your pipelines in production is pretty straightforward, but you can greatly simplify this process (and prevent issues caused by inconsistencies between environments!) by wrapping them all up into a project-specific Docker container image: "a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings."

This image can then be used on any environment running Docker (or a compatible tool like Kubernetes) to directly run meltano commands in the context of your project, without needing to separately manage the installation of Meltano, your project's plugins, or any of their dependencies.

Learn more about containerization using Docker

Follow the project initialization instructions above and then...

# For these examples to work, ensure that
# Docker has been installed
docker --version

# Add Docker files to your project
meltano add files docker

# Build Docker image containing
# Meltano, your project, and all of its plugins
docker build --tag meltano-demo-project:dev .

Your meltano-demo-project:dev Docker image is now ready for its first container!

# View Meltano version
docker run meltano-demo-project:dev --version

# Run gitlab-to-jsonl pipeline with
# mounted volume to exfiltrate target-jsonl output
docker run \
  --volume $(pwd)/output:/project/output \
  meltano-demo-project:dev \
  elt tap-gitlab target-jsonl --job_id=gitlab-to-jsonl

Your data has now been extracted and loaded!

# A UI for management and monitoring

In line with our current focus on data engineers comfortable with CLIs and version control, Meltano is optimized for usage through the meltano CLI and direct changes to the meltano.yml project file.

However, a web-based UI is also available for when you want to quickly check the status and most recent logs of your project's scheduled pipelines, or if you want to give less technical team members or clients the option to configure their extractors, loaders, and pipelines themselves.

Learn more about Meltano UI

Follow the project initialization instructions above and then...

# Start Meltano UI
meltano ui

Meltano UI is now available at http://localhost:5000!

Meltano UI

Intrigued?

Get started

Subscribe to our newsletter

Get the most significant news about Meltano delivered right to your inbox!

Subscribe