How to connect to ANY data source using Meltano
There is one big problem with all data integration tools on the market: The data space is evolving faster than ever. New data “sources” grow like weeds, and no data integration tool is able to provide total coverage of them.
Even worse for you as a data engineer, your individual data stack is likely unique enough so that there always will be one or two connectors missing regardless of what tool you’re choosing.
That’s why Meltano is continuously investing in making it as easy as possible to build your own sources, taps and even other kinds of extensions if you like. Developing your own tap or target with Meltano takes a little as 10 minutes of your time.
Let’s look at the two ways people use to extend Meltano to fit their unique data stacks.
Making connector development fast and testable
Data engineers shouldn’t have to worry about writing too much custom code to connect to data sources. And with Meltano’s 550+ connectors, you’re quite unlikely to customize more than a few of them.
To still make this process simple, data engineers need proper wrappers around common functionality. We provide these in the format of the Meltano Singer SDK, and a cookiecutter template that will jumpstart your connector development.
(1) Modify an existing connector to fit your purposes
Our internal data team collects data from github about repositories publishing “taps” and “targets”. That means we want to search for such repositories, but we do want to filter out our own ones.
Interestingly, when we started doing this, the connector, “tap-github”, didn’t support the filtering mechanism. So we took the existing connector, and made our own version out of it, with these filtering capabilities.
Your new connector doesn’t have to be open source, you can keep it as private as you want. The process is plain and simple:
- You fork any connector, e.g. singer-io/tap-github.
- You modify it to fit your purposes. We made our version available at MeltanoLabs/tap-github.
- You use your own connector inside your Meltano project.
As you can always use any kind of “pip install” path inside the Meltano project file, you can include any variation of a connector you want, in our case it works like this (extract.meltano.yml, pip url now changed to publicly available version):
- name: tap-github namespace: tap-github pip_url: git+https://github.com/MeltanoLabs/tap-github@d99378778c0cebc446c12b552ee4fd386fdc2610 config: organizations: - MeltanoLabs - meltano stream_maps: issues: __filter__: record['type'] = 'issue' select: - repositories.* - pull_requests.* - issues.* - '!issues.body' - '!issues.title' - '!pull_requests.body' - '!pull_requests.title'
There are many other reasons to modify an existing connector, maybe it doesn’t do what you want, maybe you need to add some internal requests, maybe you need to adapt it to your environment. Whatever the reason, modifying an existing connector is even easier than writing one yourself. And even that can happen in almost 10 minutes.
(2) Build your own connector because it doesn’t exist yet
When our data team decided to use data about all the connectors we have to publish a selected aggregation back to the Meltano Hub, they realized the right target is still missing. So they took it into their own hands, and created target-yaml.
If you want to create a new connector to source data yourself, the process is really simple. Have your Python environment running, then:
- Launch the cookiecutter template by calling cookiecutter https://github.com/meltano/sdk –directory=”cookiecutter/tap-template”
- Follow and answer the prompts you’ll be given
- Follow our detailed tap creation guide or check out the SDK references for yourself to quickly create the functions you need to extract your data.
- Install your newly created connector into your own Meltano project for testing, and running.
And that’s it! You can find detailed instructions inside our Meltano Singer SDK documentation. With 550+ connectors available to everyone and powerful customization options, Meltano is able to provide coverage of all of your data stack, no matter how niche it is.
We’re keen on making development of custom & modified connectors as easy as possible, that’s why we keep on investing in testing, templating and documentation.