Advanced Tutorials

Now that you know how to create a new project, we recommend checking out our Carbon Emissions tutorial to explore what Meltano is capable of. If you are feeling more adventurous, feel free to skip it and move on!

Table of Content

Experience a taste of data extraction and loading using Meltano CLI followed by analysis using Meltano UI with its automatic SQL generation.

  • SQLite database
  • No authentication
  • Public carbon emissions dataset

Experience the full extract, load, and transorm pipeline with authentication using Meltano CLI followed by analysis using Meltano UI with its automatic SQL generation.

  • Postgres database
  • Authentication
  • Your private Salesforce dataset

Experience building custom transforms and models.

  • Postgres database
  • Authentication
  • Your private Salesforce dataset
  • Custom transforms
  • Custom models

Starter - Carbon Emissions

This is the Carbon Intensity API (carbon emissions/footprint) and SQLite tutorial. This datasource was chosen as it is public, free, and does not require credentials to access. It guides you through data extraction from the Carbon Intensity API, loading extracted entities to a SQLite database, and analyzing the results.


This tutorial is perfect if your goal is to get Meltano up and running as quickly as possible.


Initialize Your Project

Navigate to the directory in your terminal where you want your Meltano project to be installed. Then run the following commands:


Run source venv/bin/activate to leverage the meltano installed in your virtual environment (venv) if you haven't already.

Initialize a new project with a folder called carbon:

meltano init carbon

Change directory into your new carbon project:

cd carbon

Let's see what extractors and loaders are available

meltano discover all

Run the extractor (tap) and loader (target)

meltano elt tap-carbon-intensity target-sqlite

Ensure Meltano UI will know how to use data from ELT:

meltano add model model-carbon-intensity-sqlite

Congratulations! You have just extracted all the data from the Carbon Intensity API and loaded it into your local SQLite database.


Meltano extracts data from various sources like Salesforce, Zendesk, and Google Analytics and then loads that data into the database of your choice. You can use community extractors and loaders or write your own too.

Meltano's ELT pipeline empowers you to aggregate data from various sources and then gather insights from them using Meltano UI with its automatic SQL generation.

Analyze with Meltano UI

Now that your data is extracted and loaded, it is ready to be analyzed. Time to start up the web app! Go back into your terminal and run the following command:

meltano ui

This will start a local web server at http://localhost:5000.

When you visit the URL, you should see:

Meltano UI with Carbon API initial loading screen


Having issues with Meltano? Help us help you. Here is a pre-baked form to streamline us doing so.


With Meltano UI up and running, we can automatically generate queries with as little as a single click and then explore the query results:

  • In the top navigation, hover over Analyze and click on Region
  • Under the Region section on the left navigation:
    • Toggle at least one aggregate button to generate SQL
    • Toggle any number of column buttons to generate SQL
    • Click the Run Query button in the upper right to query using the generated SQL
  • Open the Charts accordion to visualize the data!

Screenshot of Meltano UI with Carbon API charts

Intermediate - Salesforce

This is the Salesforce API and Postgres database tutorial. It guides you through data extraction from your Salesforce account, loading extracted entities to a Postgres DB, transforming the raw data, and analyzing the results.


Initialize Your Project

To get started, navigate to a directory, in your terminal, where you want your Meltano project to be installed and run the following commands:


Run source venv/bin/activate to leverage the meltano installed in your virtual environment (venv) if you haven't already.

Initialize a new project with a folder called sfdc-project

meltano init sfdc-project

Change directory into your new sfdc-project project

cd sfdc-project

Start docker postgres instance

docker-compose up -d warehouse_db

Let's see what extractors and loaders are available

meltano discover all

Add tap-salesforce - to select which Salesforce entities will be extracted before running the meltano elt command and set the credentials for your Salesforce instance

meltano add extractor tap-salesforce

Add target-postgres - to set the credentials for your Postgres DB

meltano add loader target-postgres

Set Your Credentials

Update the .env file in your project directory (i.e. sfdc-project) with the SFDC and Postgres DB credentials.





You can leave SFDC_URL and SFDC_CLIENT_ID as they are in the example above, but you have to set SFDC_USERNAME, SFDC_PASSWORD and SFDC_SECURITY_TOKEN and SFDC_START_DATE according to your instance and preferences.

Select The Entities to Export from Salesforce

A Salesforce account may have more than 100 different entities. In order to see the list of available entities, please run

meltano select tap-salesforce --list --all

In this tutorial, we are going to work with a couple of the most common ones and show you how to select entities to extract from a specific API: Account, Contact, Lead, User, Opportunity and Opportunity History:

meltano select tap-salesforce "User" "*"
meltano select tap-salesforce "Account" "*"
meltano select tap-salesforce "Lead" "*"
meltano select tap-salesforce "Opportunity" "*"
meltano select tap-salesforce "OpportunityHistory" "*"
meltano select tap-salesforce "Contact" "*"

Run ELT (extract, load, transform)

Run the full Extract > Load > Transform pipeline:

meltano elt tap-salesforce target-postgres --transform run

Depending on your Account, the aforementioned command may take from a couple minutes to a couple hours. That's why we propose to set the SFDC_START_DATE not too far in the past for your first test.

You could also extract and load the data and then run the transformations at a later point (examples below):

Only run the Extract and Load steps:

meltano elt tap-salesforce target-postgres

Only run the Transform Step:

meltano elt tap-salesforce target-postgres --transform only

The transform step uses the dbt transforms defined by Mavatar's Salesforce dbt package. When meltano elt tap-salesforce target-postgres --transform run is executed, both default and custom dbt transformations in the transform/ directory (a folder created upon project initilization) are being performed.

In order to visualize the data with existing transformations in the UI, the final step would be to add models:

Add existing models:

meltano add model model-salesforce

Setup incremental ELT

Per default, Meltano will pull all data in the ELT process. This behavior is perfect to get started because of its simplicity. However, some datasets are too big to query as a whole: the solution is incremental ELT.

Incremental ELT will persist the extraction cursor (named state) to make sure any subsequent ELT only pull the data that changed after this cursor. This feature is currently implemented by the extractors and is pretty simple to setup in your Meltano project.


Support for incremental ELT varies from extractor to extractor.

To enable it, Meltano must know which cursor to use for the ELT, which is set using the --job_id parameter on the meltano elt command. Alternatively, one can use the MELTANO_JOB_ID environmental variable. For each subsequent ELT, Meltano will look for a previous cursor to start from.

The first run will create a cursor state:

meltano elt --job_id=gitlab tap-gitlab target-postgres

Subsequent runs will start from this cursor:

meltano elt --job_id=gitlab tap-gitlab target-postgres


Schedules currently only support the MELTANO_JOB_ID environment variable, which need to be set manually in the meltano.yml.

  - name: gitlab_postgres

Interact with Your Data in The Web App

In order to start the UI, where you can interact with the transformed data, please go back to your terminal and execute the following command:

This will start a local web server at http://localhost:5000

meltano ui

When you visit the URL, you will be using the default connection to Meltano's SQLite database. In order to allow the UI to access your postgres DB instance, please follow the steps below:

  1. Navigate to the Postgres Loader Configuration (Configuration > Loaders > target-postgres > Configure)
  2. Enter connection settings
  • Name = postgres_db (important to use that name if you are following the tutorial)
  • Dialect = PostgresSQl
  • Host = localhost
  • Port = 5502
  • Database, Username, Password = warehouse
  • Schema = analytics
  1. Click "Save Connection"

You can now query and explore the extracted data:

  • Navigate to Analyze > sf opportunity history joined (under SFDC in the drop-down)
  • Toggle Columns and Aggregates buttons to generate the SQL query.
  • Click the Run button to query the transformed tables in the analytics schema.
  • Check the Results or Open the Charts accordion and explore the data.

Advanced - Create a Custom Extractor

As much as we'd like to support all the data sources out there, we'll need your help to get there. If you find a data source that Meltano doesn't support right now, it might be time to get your hands dirty.

We aim to make Meltano as thin as possible on top of the components it abstracts, so adding a new plugin should be straightforward.

How to Create a Extractor

First things first, you'll need a data source to integrate: in this example, let's say we want to create a tap to fetch data from GitLab.


If you are looking to integrate GitLab's data into your warehouse, please use tap official

Create the Plugin's Package

Meltano uses Singer taps and targets to extract and load data. For more details about the Singer specification, please visit


cookiecutter is a python tool to scaffold projects quickly from an existing template.

pip install cookiecutter
cookiecutter gh:singer-io/singer-tap-template
> project_name: tap-gitlab-custom

Add the Plugin to Your Meltano Project (--custom)

Now that your plugin is part of your Meltano project, you need to add your plugin configuration in the meltano.yml's plugin definition.


Using -e will install the plugin as editable so any change you make is readily available.

meltano add --custom extractor tap-gitlab-custom
> namespace: gitlab 
> pip_url: -e tap-gitlab-custom
> executable: tap-gitlab-custom

Meltano exposes each plugin configuration in the plugin definition, located in the meltano.yml file.


Meltano manages converting the plugin's configuration to the appropriate definition for the plugin. You can find the generated file in .meltano/run/tap-gitlab-custom/tap.config.json.

Looking at the tap-gitlab-custom definition, we should see the following (notice the settings section is missing):


  - executable: tap-gitlab-custom
    name: tap-gitlab-custom
    namespace: gitlab
    pip_url: -e tap-gitlab-custom

Let's include the default configuration for a sample tap:


  - executable: tap-gitlab-custom
    name: tap-gitlab-custom
    namespace: gitlab
    pip_url: -e tap-gitlab-custom
    - name: username
    - name: password
      kind: password
    - name: start_date
      value: "2015-09-21T04:00:00Z"

Plugin Setting

When creating a new plugin, you'll often have to expose some settings to the user so that Meltano can generate the correct configuration to run your plugin.

To expose such a setting, you'll need to define it as such

  • name: Identifier of this setting in the configuration.
    The name is the most important field of a setting, as it defines how the value will be passed down to the underlying component.
    Nesting can be represented using the . separator.

    • foo represents the { foo: VALUE } in the output configuration.
    • foo.a represents the { foo: { a: VALUE } } in the output configuration.
  • kind: Represent the type of value this should be, (e.g. password or date_iso8601).


We are currently working on defining the complete list of setting's kind. See issue (#739) for more details.

  • env (optional): Define the environment variable name used to set this value at runtime. Defaults to NAMESPACE_NAME.
  • value (optional): Define the default value for this variable. It should also be used as a placeholder for UX purposes.

Once the settings are exposed, you can use any of the following to set the proper values (in order of precedence):

  • Environment variables
  • config section in the plugin
  • Meltano UI
  • value of the setting's definition


Due to an outstanding bug (#521) you must run meltano install after modifying the settings section of a plugin.

Interacting with your new plugin

Now that your plugin is installed and configured, you are ready to interact with it using Meltano.

use meltano invoke to run your plugin in isolation:

meltano invoke tap-gitlab-custom --discover

Use meltano select to parse your catalog:

meltano select --list tap-gitlab-custom '*' '*'

Run an ELT using your new tap:

meltano elt tap-gitlab-custom target-sqlite


Advanced - Adding Custom Transformations and Models


We assume that you have also run the ELT steps in the SalesForce tutorial. Nothwithstanding, you can follow this tutorial in order to create entirely new transformations and run the whole ELT process in the end.


As an example, let's create transformations, which will allow us to look at closed opportunities (i.e. actual sales) by year, quarter and the following categorical dimensions:

  • Size of the deal, i.e. Small (<5k$), Medium (5k$ - 25k$), Big (25k$ - 100k$), Jumbo (>100k$)
  • Type of deal, i.e. New Business, Renewal, Add-On Business
  • Client's location
  • Client's industry
  • Client's size, i.e. Small (<100 employees), Medium (100 - 999 employees), Large (1k - 20k employees), Strategic (>20k employees)

Adding Custom Transforms

Let's create two additional tables:

  • A table that includes won opportunities only and a custom category column for deal_size (opportunity_won.sql)
  • A table that includes account categories, clients' countries, industries and a custom category column for company_size (account_category.sql).

These tables must be added as dbt models (.sql files) under the sfdc-project/transform/models/my_meltano_project/ directory or any of its subdirectories.

# opportunity_won.sql
with source as (
    -- Use the base sf_opportunity model defined by Meltano's 
    --  prepackaged tap_salesforce model
    select * from {{ref('sf_opportunity')}}


opportunity_won as (
        -- Attributes directly fetched from the Opportunity Table



        -- Additional Calculated Fields

        -- Add a deal size categorical dimension
        CASE WHEN
          amount :: DECIMAL < 5000
          THEN '1 - Small (<5k)'
        WHEN amount :: DECIMAL >= 5000 AND amount :: DECIMAL < 25000
          THEN '2 - Medium (5k - 25k)'
        WHEN amount :: DECIMAL >= 25000 AND amount :: DECIMAL < 100000
          THEN '3 - Big (25k - 100k)'
        WHEN amount :: DECIMAL >= 100000
          THEN '4 - Jumbo (>100k)'
        ELSE '5 - Unknown'
        END                         AS deal_size,

        -- Add Closed Date, Month, Quarter and Year columns
        CAST(closed_date AS DATE) as closed_date, 
        EXTRACT(MONTH FROM closed_date) closed_month,
        EXTRACT(QUARTER FROM closed_date) closed_quarter, 
        EXTRACT(YEAR FROM closed_date) closed_year  

    from source

    where is_won = true and is_closed = true

select * from opportunity_won
# account_category.sql
with source as (
    -- Use the base sf_opportunity model defined by Meltano's 
    --  prepackaged tap_salesforce model
    select * from {{ref('sf_account')}}


account_category as (



        -- Set NULL values to 'Unknown'
        COALESCE(company_country, 'Unknown') as company_country,

        -- Set NULL values to 'Unknown'
        COALESCE(industry, 'Unknown') as industry,
        -- Add a company size categorical dimension
        CASE WHEN
          number_of_employees < 100
          THEN '1 - Small (<100)'
        WHEN number_of_employees >= 100 AND number_of_employees < 1000
          THEN '2 - Medium (100 - 999)'
        WHEN number_of_employees >= 1000 AND number_of_employees < 20000
          THEN '3 - Large (1k - 20k)'
        WHEN number_of_employees >= 20000
          THEN '4 - Strategic (>20k)'
        ELSE '5 - Unknown'
        END                         AS company_size

    from source

select * from account_category

In order to have the results of the transformations materialized in the analytics schema, we need to update sfdc-project/transform/dbt_project.yml. For more details on materialization options, please check dbt's documentation.

# Update `my_meltano_project: null` to `my_meltano_project: materialized: table`
    materialized: table
      livemode: false
      schema: '{{ env_var(''PG_SCHEMA'') }}'

We are now ready to run the required ELT steps again.

Runs transformation step only

meltano elt tap-salesforce target-postgres --transform only

Adding Custom Models

In order to access the newly transformed data in the UI, 2 additional types of files must be created:

  • A table.m5o file, which defines the available columns and aggregates for each table
  • A topic.m5o file, which represents the connections between tables, i.e. what they can be joined on.

These files must be added as .m5o files under the sfdc-project/model/ directory.

# opportunity_won.table.m5o
  version = 1
  sql_table_name = opportunity_won
  name = opportunity_won
  columns {
    opportunity_id {
      primary_key = true
      hidden = true
      type = string
      sql = "{{table}}.opportunity_id"
    owner_id {
      label = Owner ID (User)
      hidden = yes
      type = string
      sql = "{{TABLE}}.owner_id"
    account_id {
      label = Account ID
      hidden = yes
      type = string
      sql = "{{TABLE}}.account_id"
    opportunity_type {
      label = Oportunity Type
      description = Oportunity Type
      type = string
      sql = "{{table}}.opportunity_type"
    lead_source {
      label = Lead Source
      description = Lead Source
      type = string
      sql = "{{table}}.lead_source"
    deal_size {
      label = Deal Size
      description = Deal Size
      type = string
      sql = "{{table}}.deal_size"
    closed_date {
      label = Closed Date
      description = Date the Opportunity Closed
      type = string
      sql = "{{table}}.closed_date"
    closed_month {
      label = Closed Month
      description = Month the Opportunity Closed
      type = string
      sql = "{{table}}.closed_month"
    closed_quarter {
      label = Closed Quarter
      description = Quarter the Opportunity Closed
      type = string
      sql = "{{table}}.closed_quarter"
    closed_year {
      label = Closed Year
      description = Year the Opportunity Closed
      type = string
      sql = "{{table}}.closed_year"
  aggregates {
    total_opportunities {
      label = Total Opportunities
      description = Total Opportunities
      type = count
      sql = "{{table}}.opportunity_id"
    total_amount {
      label = Total Amount
      description = Total Amount
      type = sum
      sql = "{{table}}.amount"
    avg_amount {
      label = Average Amount
      description = Average Amount
      type = avg
      sql = "{{table}}.amount"
# account_category.table.m5o
  version = 1
  sql_table_name = account_category
  name = account_category
  columns {
    account_id {
      primary_key = true
      hidden = true
      type = string
      sql = "{{table}}.account_id"
    company_country {
      label = Company Country
      description = Company Country
      type = string
      sql = "{{table}}.company_country"
    company_size {
      label = Company Size
      description = Company Size
      type = string
      sql = "{{table}}.company_size"
    industry {
      label = Industry
      description = Industry
      type = string
      sql = "{{table}}.industry"
# custom_sfdc.topic.m5o
  version = 1
  name = custom_sfdc
  connection = postgres_db
  label = Salesforce (Custom)
  designs {
    opportunity_won {
      label = Opportunities Won
      from = opportunity_won
      description = SFDC Opportunities Won
      joins {
        account_category {
          label = Opportunity
          sql_on = "opportunity_won.account_id = account_category.account_id"
          relationship = many_to_one

Interact with Your Data in The Web App

Interact with Your Data in The Web App

Advanced - Using tap-postgres with Meltano

This is a tutorial on how to run tap-postgres with target-postgres in Meltano.


tap-postgres is not currently officially supported by Meltano, so you have to add it as a custom tap. For more details, check the documentation on adding a custom extractor.

Project Initialization

Let's start by initializing a new Meltano Project and add the supported loader target-postgres:

meltano init tap-postgres --no_usage_stats
cd tap-postgres
meltano add loader target-postgres

Adding a Custom Extractor

Next step is to add tap-postgres as a custom extractor. We'll use the tap-postgres provided by the community:

meltano add --custom extractor tap-postgres

  (namespace): tap-postgres
  (pip_url): tap-postgres
  (executable) [tap-postgres]: tap-postgres

We should then update meltano.yml and add the configuration parameters this tap needs in order to run:


  - name: sqlite
  - name: postgresql
  - executable: tap-postgres
    name: tap-postgres
    namespace: tap-postgres
    pip_url: tap-postgres
      - name: dbname
        env: TAP_PG_DATABASE
      - name: host
        env: TAP_PG_ADDRESS
      - name: password
        env: TAP_PG_PASSWORD
      - name: port
        env: TAP_PG_PORT
      - name: user
        env: TAP_PG_USERNAME
      default_replication_method: FULL_TABLE
      include_schemas_in_destination_stream_name: true
  - name: target-postgres
    pip_url: git+
send_anonymous_usage_stats: false
version: 1.0

And finally update the project's .env to add the proper settings for the source and the target databases. The TAP_PG_* variables are used by the Tap (i.e. they define the source DB where the data are extracted from), while the PG_* variables are used by the Target (i.e. they define the target DB where the data will be loaded at)


export FLASK_ENV=development
export SQLITE_DATABASE=meltano

export TAP_PG_DATABASE=my_source_db
export TAP_PG_ADDRESS=localhost
export TAP_PG_PORT=5432
export TAP_PG_USERNAME=source_username
export TAP_PG_PASSWORD=source_password

export PG_DATABASE=my_target_db
export PG_PASSWORD=target_password
export PG_USERNAME=target_username
export PG_ADDRESS=localhost
export PG_PORT=5432
export PG_SCHEMA='test_tap_postgres'

Let's make sure that everything has been set correctly:

meltano config tap-postgres

  {'default_replication_method': 'FULL_TABLE', 'include_schemas_in_destination_stream_name': True, 'dbname': 'my_source_db', 'host': 'localhost', 'password': '***', 'port': '5432', 'user': '***'}

meltano config target-postgres

  {'user': '***', 'password': '***', 'host': 'localhost', 'port': '5432', 'dbname': 'my_target_db', 'schema': 'test_tap_postgres'}

Filtering out data

This step is required if you don't want to export everything from the source db. You can skip it if you just want to export all tables.

We can use meltano select to select which entities will be exported by the Tap from the Source DB. You can find more info on how meltano select works on the Meltano cli commands Documentation.

In the case of tap-postgres, the names of the Entities (or streams as they are called in the Specification) are the same as the table names in the Source DB, prefixed by the DB name and the schema they are defined into: {DB NAME}-{SCHEMA NAME}-{TABLE NAME}.

For example, assume that you want to export the users table and selected attributes from the issues table that reside in the tap_gitlab schema in warehouse DB. The following meltano select commands will only export those two tables and data for the selected attributes:

meltano select tap-postgres "warehouse-tap_gitlab-users" "*"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "id"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "project_id"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "author_id"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "assignee_id"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "title"
meltano select tap-postgres "warehouse-tap_gitlab-issues" "state"

Finally, you can use meltano select <tap_name> --list command to make sure that everything has been set correctly:

meltano select tap-postgres --list

Enabled patterns:

Selected properties:
  [selected ] warehouse-tap_gitlab-issues.title
  [selected ] warehouse-tap_gitlab-issues.assignee_id
  [selected ] warehouse-tap_gitlab-issues.state
  [selected ] warehouse-tap_gitlab-issues.project_id
  [selected ] warehouse-tap_gitlab-issues.author_id
  [selected ] warehouse-tap_gitlab-users.username
  [selected ] warehouse-tap_gitlab-users.avatar_url
  [selected ] warehouse-tap_gitlab-users.web_url
  [selected ]
  [selected ] warehouse-tap_gitlab-users.state

Figuring out the names of the streams

In case you are not sure what the names of the streams are, you can use meltano invoke to run tap-postgres in isolation and generate a catalog file:

meltano invoke tap-postgres --discover > .meltano/run/tap-postgres/

You can then check that file and decide which Streams (tables in this case) should be exported and use their tap_stream_id property when running meltano select.

Run Meltano ELT

Finally run meltano elt to export all the selected Entities and load them to the schema of the target DB defined by PG_SCHEMA (test_tap_postgres in this example)

meltano elt tap-postgres target-postgres 

Next Steps

If you want to add custom Transforms and explore the extracted data using Meltano UI, you should check the advanced tutorial on Adding Custom Transformations and Models

Using Jupyter Notebooks

Once the meltano elt pipeline has successfully completed and data extracted from an API or a Data Source have been transformed and loaded to the analytics schema of your Data Warehouse, you can use Meltano UI or any data exploration tool to analyze and generate reports.

In this tutorial, we are going to present how to connect Jupyter Notebook to a Meltano Project that uses Postgres to store the transformed data.


  • Meltano's minimum and optional requirements installed
  • Docker started
  • You have successfully extracted and loaded data from an API by following the steps described in the previous Tutorials.

Jupyter Notebook Installation

If you have Jupyter already installed in your system, you can skip this step.

The most common options for installing Jupyter Notebook are by either using Anaconda or pip. We are going to use pip in this tutorial, as Meltano also uses pip for its installation.


If you used a virtual environment (venv) to install and run Meltano, don't forget to first navigate to the directory with your venv and run source venv/bin/activate to enable it.

The following commands will install Jupyter Notebook and the most common python libraries required to connect to a Database (psycopg2, sqlalchemy), manipulate data (pandas) and generate some ad hoc plots (matplotlib):

pip install jupyter
pip install numpy
pip install pandas
pip install psycopg2
pip install sqlalchemy
pip install matplotlib

Once the installation is completed, you are set to use Jupyter Notebooks with Meltano.

Running Jupyter Notebook

(Optional) Navigate to your Meltano Project and make the credentials you used with Meltano available to the environment the Jupyter Notebook will run:

cd /path/to/my/meltano/project
set +a
source .env
set -a 

This is an optional step, but allows us to use the same credentials (e.g. for connecting to Postgres) from inside Jupyter Notebook without entering them again and, more importantly, without exposing any sensitive information inside the Notebook in case you want to share the Notebook with others.

You can now navigate to Meltano's directory for storing your notebooks and start Jupyter Notebook:

cd notebook/
jupyter notebook

This will print some information about the notebook server in your terminal, including the URL of the web application (by default, http://localhost:8888):

$ jupyter notebook
[I 13:18:36.606 NotebookApp] Serving notebooks from local directory: /home/iroussos/work/code/my-projects/jupyter-tutorial/notebook
[I 13:18:36.609 NotebookApp] The Jupyter Notebook is running at:
[I 13:18:36.610 NotebookApp] http://localhost:8888/
[I 13:18:36.612 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

It will then open your default web browser to this URL.

When the notebook opens in your browser, you will see the Notebook Dashboard, which will show a list of the notebooks, files, and subdirectories in the directory where the notebook server was started.

If this is the first time you start jupyter notebook from the notebook directory of your Meltano project, the list will be empty. Let's start a new Python notebook!

Notebook Basics

While on the Notebook Dashboard, you can start a new Notebook by selecting Python 3 from the New drop down menu.

We are going to showcase the most simple and straightforward way to connect to your analytics schema, fetch some transformed data and generate some plots.

The first step for a data exploration Notebook is to import the proper libraries required for data exploration and manipulation and then setup the connection to the Database (Postgres in our case) so that we can fetch data:

Cell 1

# Import required libraries
import pandas as pd
import psycopg2
import sqlalchemy
import matplotlib as plt 
import os

from sqlalchemy import create_engine

%matplotlib inline

# Get the Postgres username, password, and database name from the Environment
# You can also set them directly here, but it's better not to include passwords
#  or parameters specific to you inside the Notebook
POSTGRES_PORT = os.getenv("PG_PORT")

# Connect to the analytics schema, not one of the schemas with the raw data extracted
PG_SCHEMA = 'analytics'

# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'

# Create the connection
cnx = create_engine(postgres_str)

Click |>| Run and we are set to start exploring the data in brought in with Meltano.

You can then write queries and generate plots at will.

As an example, assume that you have loaded data from your Zendesk Account by using tap_zendesk. You can then check the most important Ticket statistics by month:

Cell 2

# Query to send to the Database
sql_query = f'''
  created_year || '-' || created_month as month, 
  COUNT(*) as total_tickets,
  SUM(ticket_unsolved_counter) as unsolved_tickets,
  SUM(ticket_solved_counter) as solved_tickets,
  SUM(ticket_one_touch_counter) as one_touch_tickets,
  SUM(ticket_reopened_counter) as reopened_tickets,
  SUM(replies) as total_replies,
  ROUND(AVG(replies), 2) as avg_replies,
  ROUND(AVG(full_resolution_time_in_minutes_business), 2) as avg_res_time_mins
FROM {PG_SCHEMA}.zendesk_tickets_xf

GROUP BY created_year, created_month

ORDER BY created_year, created_month;

# Execute the query and store the result in a pandas dataframe
result = pd.read_sql_query(sql_query, cnx)

# Print the result to the output to check what the query brought in

Or generate a bar plot:

Cell 3

plt.rcParams['figure.figsize'] = [15, 8]'month', y=['total_tickets','unsolved_tickets','solved_tickets','one_touch_tickets','reopened_tickets'])
Last Updated: 8/15/2019, 6:16:04 PM