Today the Meltano team is excited to announce a milestone for the Singer community: the v0.1.0 launch of our Singer Tap Software Development Kit! The SDK is a framework that makes it easier than ever to build high quality data extractors, aka taps. With the SDK, tap developers can take full advantage of the Singer spec without being an expert on it, while enabling them to focus on the code unique to the API or database they are extracting data from.
This is one of the many things we’re working on in the coming months to lift up the entire Singer and data integration ecosystem. In addition to continued development of easy-to-use SDKs, we’re also working to simplify the spec so it’s easier to understand for people new to the community. We’re also building a MeltanoHub for Singer so developers can easily find, use, and contribute to high quality taps and targets. There’s also been huge interest from the community around extending the Singer protocol and we expect our efforts with the SDK and within the ecosystem to make it easy for the spec to grow and evolve.
Genesis of the SDK
As you can see, we’re bullish on the potential of Singer and its community, but we recognized the real challenges of building and maintaining high quality taps and targets. We saw that standardization across taps could be dramatically improved – some would implement all parts of the Singer spec while others did the bare minimum. We also saw how challenging it was to build a new tap with all of the great features of the protocol.
AJ Steers, who joined the Meltano team last month, saw these problems too and started work on the SDK as a community member. As a consultant, AJ has experienced the challenge of building taps that implemented all parts of the Singer spec without any sort of scaffolding to help him. “Optional” features such as incremental replication and stream/property selection were difficult to add.
The SDK solves all of these problems. Developers now don’t have to become an expert in the spec to write a high quality tap. They can focus on the code that’s unique to the API or database and will get all of the good stuff for free.
Meltano SDK In the Wild
Our goal with the SDK is to decrease the amount of code developers have to write by over 70%. To that end, we have a cookiecutter template which will ask you a few questions on the command line:
> cookiecutter https://gitlab.com/meltano/singer-sdk --directory="cookiecutter/tap-template"
source_name [MySourceName]: mytap
tap_id [tap-mytap]:
library_name [mytap]:
Select stream_type:
1 - REST
2 - GraphQL
3 - Other
Choose from 1, 2, 3 [1]:
Select auth_method:
1 - Simple
2 - OAuth2
3 - JWT
4 - Custom or N/A
Choose from 1, 2, 3, 4 [1]:
And then within the code we’ve marked multiple #TODO
s to highlight where you need to add the custom code for your source. It’s that easy!
Even before this official release, we had many community members interested in the project who started to use it before it was “ready”.
Edgar Ramirez from SpotOn is developing a tap for the Confluence Content API. Edgar said even before the v0.1.0 release, “I’m focusing on extracting data from REST APIs at the moment and it’s amazing how quickly you can get your tap to output Singer spec messages. [It’s just] a few lines of code, 90% of which deal with idiosyncratic pagination and authentication, the rest are just stream (name, schema, primary keys) and tap (config schema) declarations”.
Derek Visch from AutoIDM needed a quick proof of concept to show he could make a connection work between BambooHR and ActiveDirectory. “Using the SDK, I had BambooHR data coming through in 2 hours, and that’s with no Python experience.”
John Timeus from Slalom developed a tap for PowerBI Metadata and even our own Douwe Maan shared in our first Demo Day the tap he built to pull data from investing.com.
But don’t just take their word for it – check out our dev guide to get starting making your own tap!
The Future of the SDK
v0.1.0 of the SDK is just the beginning. While it supports the basics you’d expect such as data sync, catalog discovery, and bookmark tracking, we have a plethora of features we want to add to the SDK. This includes supporting stream/property selection, database type streams, auto-generation of documentation, and handling unsorted streams. We aim to have the SDK support everything the Singer spec currently allows for, but we also want to build support for extensions on the spec including the `ACTIVATE_VERSION` and `BATCH` message types. You can see these issues and more in the issue tracker.
We’re also planning to expand the SDK to include Targets as well as for plugins that can ingest and export Singer valid data. This latter use case enables on-the-fly stream transformations like hashing or aggregation.
We hope you can spend some time building taps with the SDK! Our expectation is that all existing taps will eventually be ported over to the SDK. When this happens, the community will benefit from taps that are easier to maintain, are more consistent across different sources, are under a more permissive open source license, and can immediately benefit from enhancements to the SDK and Singer spec.
What’s Next
Keep an eye out later this week for two more blog posts: the next one will be about the updates we’re making to the Singer spec to make it more understandable and the other will be about our long-term vision for Meltano.
On Wednesday April 7th we’ll have our weekly Office Hours which will be focused on the SDK and on April 9th we’ll have our Demo Day where community members and the core team share what they’ve been working on. In the meantime, join us on Slack to chat more about the project, or file an issue if you have feature ideas or run into any problems.
Happy building!