# Meltano: open source ELT
Meltano is an open source platform for building, running & orchestrating ELT pipelines made up of Singer taps and targets and dbt models, that you can run locally or easily deploy in production.
Our goal is to make the power of data integration available to all by building a true open source alternative to existing proprietary hosted EL(T) solutions, in terms of ease of use, reliability, and quantity and quality of supported data sources.
Scroll down for details on Meltano projects, integration (EL), transformation (T), orchestration, containerization, and Meltano UI.
Experience it for yourself in just a few minutes!
# For these examples to work, ensure that:
# - you are running Linux or macOS
# - Python 3.6, 3.7 or 3.8 has been installed
python3 --version
# Create directory for Meltano projects
mkdir meltano-projects
cd meltano-projects
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install Meltano
pip3 install meltano
Meltano is now ready for its first project!
# Your Meltano project: a single source of truth
At the core of the Meltano experience is your Meltano project, which represents the single source of truth regarding your ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various plugins that make up your pipelines should be configured.
Since a Meltano project is just a directory on your filesystem containing text-based files, you can treat it like any other software development project and benefit from DevOps best practices such as version control, code review, and continuous integration and deployment (CI/CD).
You can initialize a new Meltano project using meltano init
.
Follow the installation instructions above and then...
# Initialize a new Meltano project
meltano init demo-project
Your Meltano project has now been initialized in the demo-project
directory!
# Before using a `meltano` command, ensure that:
# - you have navigated to your Meltano project
cd demo-project
# - you have activated the virtual environment
source ../.venv/bin/activate
Your Meltano project is now ready for integration, transformation, and orchestration!
# Integration just a few keystrokes away
You can use existing Singer taps and targets or easily write your own to extract data from any SaaS tool or database and load it into any data warehouse or file format.
Meltano manages your tap and target configuration for you, makes it easy to select which entities and attributes to extract, and keeps track of the incremental replication state, so that subsequent pipeline runs with the same job ID will always pick up right where the previous run left off.
Scroll down to learn more about transformation and orchestration.
Follow the project initialization instructions above and then...
# Add GitLab extractor to your project
meltano add extractor tap-gitlab
# Configure tap-gitlab to extract data from...
# - the https://gitlab.com/meltano/meltano project
meltano config tap-gitlab set projects meltano/meltano
# - going back to May 1st, 2020
meltano config tap-gitlab set start_date 2020-05-01T00:00:00Z
# Add JSONL loader
meltano add loader target-jsonl
# Ensure target-jsonl output directory exists
mkdir -p output
# Run data integration pipeline
meltano elt tap-gitlab target-jsonl --job_id=gitlab-to-jsonl
# Read latest tag
head -n 1 output/tags.jsonl
{"name": "v1.66.0", "message": "", "target": "7c96cd237a53b84712e1f6f9e7da8a2f7b146bbe", "commit_id": "7c96cd237a53b84712e1f6f9e7da8a2f7b146bbe", "project_id": 7603319}
Your data has now been extracted and loaded!
# Transformation as a first-class citizen
Once your raw data has arrived in your data warehouse, its schema will likely need to be transformed to be more appropriate for analysis.
Meltano helps you out here as well, with built-in (but optional!) support for running dbt models as part of your pipeline.
When you add the dbt
transformer to your project, a full-fledged
dbt project
will automatically be initialized in the transform
directory.
Any transform plugins added to your Meltano project will automatically be
added to the dbt project as well, but you can easily install
existing dbt models from packages
or write your own.
Follow the integration instructions above and then...
# For these examples to work, ensure that:
# - you have PostgreSQL running somewhere
# - you have created a new database
# - you change the configuration below as appropriate
# Add PostgreSQL loader
meltano add loader target-postgres --variant meltano
# Configure target-postgres through the environment
export TARGET_POSTGRES_HOST=localhost
export TARGET_POSTGRES_PORT=5432
export TARGET_POSTGRES_USER=meltano
export TARGET_POSTGRES_PASSWORD=meltano
export TARGET_POSTGRES_DBNAME=demo-warehouse
# Add dbt transformer and initialize dbt project
meltano add transformer dbt
# Add PostgreSQL-compatible dbt models for tap-gitlab
meltano add transform tap-gitlab
# Run data integration and transformation pipeline
meltano elt tap-gitlab target-postgres --transform=run --job_id=gitlab-to-postgres
# Start `psql` shell connected to warehouse database
PGPASSWORD=$TARGET_POSTGRES_PASSWORD psql -U $TARGET_POSTGRES_USER -h $TARGET_POSTGRES_HOST -p $TARGET_POSTGRES_PORT -d $TARGET_POSTGRES_DBNAME
-- Read latest tag
SELECT * FROM analytics.gitlab_tags LIMIT 1;
project_id | commit_id | tag_name | target | message
------------+------------------------------------------+----------+------------------------------------------+---------
7603319 | 7c96cd237a53b84712e1f6f9e7da8a2f7b146bbe | v1.66.0 | 7c96cd237a53b84712e1f6f9e7da8a2f7b146bbe |
(1 row)
Your data has now been extracted, loaded, and transformed!
# Orchestration right out of the box
Once you've managed to successfully run your ELT pipeline once, you'll probably want to run it again, and again, and again.
Meltano lets you set up pipeline schedules that can then automatically be fed to and run by a supported orchestrator like Apache Airflow.
When you add the airflow
orchestrator to your project, a
Meltano DAG generator
will automatically be added to the orchestrate/dags
directory, where Airflow
will look for DAGs by default.
If the default behavior of simply running meltano elt
on a
schedule is not going to cut it, you can easily modify the DAG generator or add your own.
Follow the transformation instructions above and then...
# Schedule pipelines
meltano schedule gitlab-to-jsonl tap-gitlab target-jsonl @hourly
meltano schedule gitlab-to-postgres tap-gitlab target-postgres @daily --transform=run
# List scheduled pipelines
meltano schedule list
# Add Airflow orchestrator and default DAG generator
meltano add orchestrator airflow
# Start the Airflow scheduler (add `-D` to background)
meltano invoke airflow scheduler
Your pipelines will now run on a schedule!
# Start the Airflow web interface (add `-D` to background)
meltano invoke airflow webserver
Airflow is now available at http://localhost:8080!
# Instantly containerizable and production-ready
Now that you've got your pipelines running locally, it'll be time to repeat this trick in production!
Since your Meltano project is your single source of truth, deploying your pipelines in production is pretty straightforward, but you can greatly simplify this process (and prevent issues caused by inconsistencies between environments!) by wrapping them all up into a project-specific Docker container image: "a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings."
This image can then be used on any environment running Docker
(or a compatible tool like Kubernetes) to directly
run meltano
commands
in the context of your project, without needing to separately manage the installation of
Meltano, your project's plugins, or any of their dependencies.
Follow the project initialization instructions above and then...
# For these examples to work, ensure that
# Docker has been installed
docker --version
# Add Docker files to your project
meltano add files docker
# Build Docker image containing
# Meltano, your project, and all of its plugins
docker build --tag meltano-demo-project:dev .
Your meltano-demo-project:dev
Docker image is now ready for its first container!
# View Meltano version
docker run meltano-demo-project:dev --version
# Run gitlab-to-jsonl pipeline with
# mounted volume to exfiltrate target-jsonl output
docker run \
--volume $(pwd)/output:/project/output \
meltano-demo-project:dev \
elt tap-gitlab target-jsonl --job_id=gitlab-to-jsonl
Your data has now been extracted and loaded!
# A UI for management and monitoring
In line with our current focus on data engineers comfortable with CLIs and version control,
Meltano is optimized for usage through the meltano
CLI
and direct changes to the meltano.yml
project file.
However, a web-based UI is also available for when you want to quickly check the status and most recent logs of your project's scheduled pipelines, or if you want to give less technical team members or clients the option to configure their extractors, loaders, and pipelines themselves.
Follow the project initialization instructions above and then...
# Start Meltano UI
meltano ui
Meltano UI is now available at http://localhost:5000!
Intrigued?
Get startedStay up to date and get in touch
Slack Blog Twitter YouTube RepoIf you're looking for help with a problem, check out Getting Help
Subscribe to our newsletter
Get the most significant news about Meltano delivered right to your inbox!
Subscribe