Meltano will enable everyone to realize the full potential of their data. On top of that, we are building an open source DataOps platform infrastructure to be the foundation of every team’s ideal data stack.
A lot has happened at Meltano since we’ve spun out of GitLab and started transitioning from a small project inside a large company (that has recently gone public — congrats to the team!) to an independent startup in our own right!
In just a few months, our all-remote team has grown from just 3 to 10 today. It has been amazing to see how much better we can serve our users and how much more we can accomplish now that we have dedicated people focused on engineering, community, and marketing. In August, eight of us had the chance to get together in Mexico City, and we have 11 more job openings across all departments that we intend to fill before the end of the year. If anything you read below resonates and you think you could make a difference, please consider joining our team!
As a newly independent company and rapidly growing team, we knew that it would be a top priority to figure out a shared sense of what we’re all about: values to inform how we work with each other and the community, a mission to describe the high-level impact we want to have in the world, a vision of our place in the world to make that happen, and a strategy to lay out how we’ll get there. Strong internal alignment on these topics allows us to work more efficiently, since everyone can test their priorities and actions against the greater overarching goal and feel empowered and confident to make decisions independently.
In this blog post, I will lay out Meltano’s brand-new mission and vision to show the wider world what motivates us, what the data community can expect from us now and in the future, and how anyone can help to make our shared dreams a reality as our inner Melty spreads his wings and reaches for the sky.
A Brief Meltano History Lesson
Our goals for Meltano have always been ambitious. GitLab’s data team founded the project in 2018. The endgame was to make, “analytics accessible to everyone, not just professional data wranglers.” Meltano itself was to become a “complete solution for data teams” that spanned the entire data lifecycle, as evidenced by the name: an acronym for Model, Extract, Load, Transform, Analyze, Notebook, Orchestrate.
While we’ve always had grand aspirations for Meltano, the vision and focus have evolved over time. Eighteen months ago, we realized that it would be easier to drive adoption with a narrower vision to deliver immediate value. As a consequence, we temporarily redirected our focus to only the first stage of the data lifecycle, where better tooling was most urgently needed.
Specifically, we made it our mission to make the power of data integration available to all by delivering a robust open source ELT (Extract—Load—Transform) platform built around software development best practices, as an alternative to inflexible and inaccessible legacy SaaS solutions.
With the help of our growing community (almost 1800 people in Slack!), we have made significant progress towards that goal:
- Meltano itself makes it easier than ever to build, run, and deploy ELT pipelines made up of open source Singer connectors and dbt models
- Meltano SDK for Singer Taps and Targets enables you to build high-quality connectors for new sources and destinations in a matter of hours
- MeltanoHub lets you discover over 300 open source connectors maintained by the ever-growing Meltano and Singer communities
- MeltanoLabs provides a new scalable model for community maintenance of these connectors
- Our Singer Working Group initiative brings together all the major players in the Singer ecosystem — including the original creators of the standard at Talend/Stitch and the team behind PipelineWise — to collaborate on the evolution of the Singer standard and ecosystem
These efforts will require continued investment from us and our ecosystem partners to reach their full potential and it’s too early to claim “mission accomplished”. Still, it’s clear that our and the wider community’s hard work is paying off, and that the future of open source data integration in general, and Singer in particular, is brighter than ever. The barrier to extracting, loading, and transforming data has never been lower.
We couldn’t be prouder of what we’ve accomplished together already, and are eager to see where Singer can go from here through the combined efforts of our team and the wider community. At the same time, we’ve never lost sight of the bigger picture.
Encouraged by our success in data integration and the resources we’ve been afforded in response, we think the time has come for us to return our focus to the entire data lifecycle instead of just the first step. As such, the mission we’ve set for ourselves is
to enable everyone to realize the full potential of their data.
When we say “everyone”, we mean it. It includes massive enterprises, tiny startups, as well as individuals. It includes those who may one day pay us, as well as those who will likely never be able to. It includes solo founders, one-person IT/dev teams, and curious hobbyists, as well as multidisciplinary data teams and weathered veterans of the data profession.
Our core convictions fuel our mission:
- Data is a key ingredient for success in whatever goal one is trying to accomplish, both to identify the right problem to solve and to know you’re building the best solution,
- More accessible data tools that adapt to and scale with the unique needs of every team will accelerate innovation and progress in every aspect of the world, and
- The awesome power of data should be available to small upstarts with pure intentions just as much as it already is to massive organizations with their often muddled incentives and goals.
It All Starts with DataOps
Our mission doesn’t end at “enabling everyone,” of course. We believe that the current state of data tooling is holding data professionals and teams back from realizing the full potential of their data. Further, we have some ideas on what’s missing, and what is needed to level up.
Meltano came out of the data team at GitLab, which builds the leading open source DevOps platform for the software development lifecycle. Working closely with software development teams inside GitLab and at its users, the GitLab Data team saw the value that DevOps brings to these teams first-hand. DevOps best practices such as version control, code review, automated end-to-end testing, and isolated development, staging, and production environments allow teams to collaborate more effectively across disciplines, gives them the ability to experiment and rapidly iterate without fear of breaking things, and increases their confidence in the result of their work.
Before working on Meltano, I was an early software engineer and engineering leader at GitLab. New to the world of data, I was surprised to find that these DevOps best practices — that I had helped popularize and then started taking for granted — were nowhere to be found in a profession that, to me, looked as close to software development as any. The concept of DataOps—DevOps applied to the data lifecycle—had slowly been gaining ground, but had not yet become ubiquitous in data teams and their tools in the same way that DevOps had caused a complete paradigm shift in software development.
Through my software developer glasses, an organization’s data stack doesn’t look like a collection of isolated products and purchasing decisions. Rather, the entire data stack can be seen as a product in its own right, tailor-made to the needs of the specific organization, that happens to be made up of various components, both off-the-shelf and custom. Exactly like a software development team, the data team is responsible for maintaining and continuously improving their product to the benefit of its users: their colleagues who interact with the data through “product features” like dashboards and notebooks. A great article by our Head of Product & Data Taylor Murphy (aka “Databae”) and ex-GitLabber and friend-of-Meltano Emilie Schario lays out the advantages of running your data team like a product team in more detail.
We are confident that data teams can benefit as much from these DevOps best practices as software development teams, and that making them an integrated part of the data lifecycle in the form of DataOps is key to enabling these teams and the organizations they serve to unlock the full potential of their data. It’s been hugely validating to see how over the past few years, dbt has inspired many teams to adopt version control, add testing, and use different environments for production and development work. It is our goal to expand these best practices to the entire data stack.
Imagine how much more confident you and your team would feel experimenting and moving quickly if anyone could propose a change to a data connector’s configuration or some transformation, and you’d be able to trust that the system would warn you if this would break a dashboard or notebook downstream, instead of having to wait to hear from a frustrated end-user who ran into the issue while in the middle of a presentation.
In a recent newsletter, Tristan Handy from dbt Labs shares similar thoughts on what data can learn from software development.
DataOps and Open Source go Hand in Hand
A key ingredient of making DataOps a reality and letting data teams take control of their data stack is Open Source. It ties clearly into our mission and conviction that great data tools need to be readily available to anyone who has a use for them, but the benefits don’t end there.
Two of the main components of DevOps are:
- Automated end-to-end testing — which helps you quickly identify the potential negative downstream impact of any change.
- Isolated environments — separating development and production to limit the impact of new changes until they’re reviewed, tested, and confirmed not to have any unintentional consequences.
Together, they enable experimentation and rapid iteration by anyone on the team, while drastically increasing confidence that whatever is live for users (colleagues) to see and use actually works as intended.
Isolated environments require that the entire product (data stack), with all its components and dependencies, can run in multiple places at the same time: on anyone’s local machine during development, in dynamically generated testing environments, and in production. This way, the full impact of any change on any part of the stack — from the data integration pipeline to the dashboard and notebook—can be validated in isolation, both manually and automatically as part of end-to-end testing, before it is unleashed on the world in production.
Open source components thrive under these requirements because their packages are readily available from public sources to be installed on any machine, their configuration can typically be managed programmatically, and they can be deployed as many times as needed on one’s self-managed infrastructure. None of this is true of proprietary SaaS tools that live in the browser, typically only support a single environment (production) where any change goes live immediately, and don’t allow you to test the downstream impact of a change before that impact is already real.
Instead of being limited by a vendor’s time and priorities, a big advantage of open source software is the flexibility it offers to tune a tool to your own needs and take ownership of your data stack. If things go wrong and your dashboard is down, you can debug the issue yourself without having to wait on vendor support. If the tool doesn’t quite meet your expectations, you can extend and improve it today without having to wait for the improvement to make it onto the product roadmap. As you come to think of your data stack as a product you are responsible for improving, the ability to do so without having to defer to or wait on anyone becomes crucial.
Finally, history shows the most useful and effective tools are those that are created in close collaboration with their end-users. There’s no way to get closer than to work on the actual issue tracker, product roadmap, and code base together. We are building data tools for data people, by data people, that bring together the best ideas and perspectives of all teams, instead of relying on the wisdom of just one team of product managers, no matter how good they may be, and their indirect customer feedback channels.
In his recent talk at the Open Source Data Stack Conference, Taylor lays out the advantages of open source software in the data stack in more detail.
Bringing DataOps to the Modern Data Stack
In the original vision, Meltano was to bring DataOps to the entire data lifecycle by becoming the single tool that does it all, like GitLab for the software development lifecycle, and like the vertically integrated data platforms of yesteryear. This one-tool-to-rule-them-all approach seemed like a natural fit, because DevOps best practices such as isolated environments and end-to-end testing expect to work with a single “product” unit that can be installed, configured, run, and deployed on-demand.
However, as the data space has evolved, there has been a shift towards horizontal integration. Multiple narrowly focused tools now compete in every step of the lifecycle, and while we talk about the “modern data stack” as if it has a clear definition, every team’s actual ideal data stack will look different based on the tools they’ve chosen and exactly how they’re all hooked up. With all the competition and rapid iteration, data teams have gained amazing abilities, and it’s become clear that no one-size-fits-all tool will be able to compete with the pick-and-choose approach that allows teams to use the best tool for the job at every stage and finely tune their stack to their own unique needs.
But as tools’ focus has narrowed and their number has increased, something has gone missing that can’t easily be restored using bi-lateral integrations between every combination of tools. As Benn Stancil from Mode pointed out in a recent article on the modern data stack experience, there is no common user experience anymore for different disciplines in a data team. People with different roles increasingly stick only to “their” tools. They will throw anything that doesn’t fit in their silo over the wall for others to pick up. This is in large part because they are worried about accidentally breaking things in tools they don’t intimately understand. At the same time, there is an increasing need for functionality—such as DataOps—that serves the entire data team. Additionally, it acts on their data stack as a whole, and requires visibility into every part of it, there is no longer a place for such meta-level functionality to be implemented.
The many tools and choices have also made it a daunting task to set up a new data stack from scratch, integrate the various components, and manage configuration and deployment, especially for small teams and people new to data.
DataOps and other end-to-end functionality like observability, governance, and lineage cannot effectively be implemented separately in each individual tool that makes up the stack. Nor is it realistic that they can be achieved through a one-size-fits-all platform that supplants everything. It is time for something new: something complementary that you can add to your stack to enable new functionality and fill in the gaps between components.
These frustrations and the desire for something new that can offset the price of decentralization are echoed in Benn Stancil’s recent articles on data’s horizontal pivot and the opportunity for a Data OS.
Our Vision for Meltano
The way we see it, the modern data stack needs a new layer that underpins it, ties it all together, and gives data teams a single place to reason about and interact with their stack as a whole, with unified configuration and deployment across components. Teams deserve a stable foundation to build their stack on, that can stay with them for years to come, and lowers the barrier to try out new tools, swap out old ones, or use alternatives side-by-side.
Our vision for Meltano is to become the…
foundation of every team’s ideal data stack.
More concretely, we are building an open source DataOps infrastructure. You can also think of it as a package manager for data tools or “Terraform for data stacks”—different interpretations that focus on different qualities, but that we otherwise see as equivalent to the data OS framing.
From day one, Meltano has had a plugin-based architecture, offered package management functionality, and provided much-needed glue between best-in-class open source data tools and technologies. As explained in the brief history lesson above, our initial focus has been on bringing DataOps to the beginning of the data lifecycle. Today, Meltano is a great open source ELT solution because of the first set of plugins we’ve decided to support: Singer taps and targets for integration, dbt for transformation, and Airflow for orchestration.
Now that our mission once again encompasses the entire data lifecycle, we aim to incrementally add plugin support for all of the tools in the modern data stack that are compatible with DataOps best practices. Since we aim to streamline the configuration and deployment of the entire data stack and the integration between its components, our focus will primarily be on open source tools that can be installed and managed by Meltano directly, but we are also exploring integrations with SaaS tools through API connections.
We want to make the barrier for anyone to get started with DataOps as low as possible. To accomplish this, there will be a set of recommended plugins for each stage of the data lifecycle that serve as a great starting position, making Meltano a “data stack in a box.” We recognize, however, that there is no such thing as a one-size-fits-all data stack and that every team’s ideal setup will look different. We have no intention of locking users into any particular set of plugins. Offering choice and flexibility is the point, but people new to data deserve a great tour guide.
For Data People by Data People
Our background in GitLab makes us confident that we can bring this vision to life. The values that have made GitLab so successful and unique, like collaboration, iteration, and transparency, are deep in our DNA, and we have adopted these and augmented them with new ones like community and sustainability.
Our team has extensive experience building developer tools that users love, fostering thriving open source communities, and introducing teams around the world to the advantages of DevOps. In addition, we bring together decades of experience working with data at organizations ranging from startups to Amazon. Together, this makes us uniquely positioned to build next-generation open source data tooling for the DataOps era that we are confident will have the same monumental impact on the data profession that DevOps has had on software development. However, we cannot do it alone.
We believe that the best tools are those built in close collaboration with their users: for data people, by data people, with full-time developers and part-time community contributors working together every day in the same code base and issue tracker towards a common goal. Your ideas and unique perspectives are critical to our mission. We cannot build your team’s ideal data tool or support you in building your ideal data stack without hearing from you.
If this mission and vision resonate with you, we invite you to come make it happen with us. In the coming weeks, we’ll be sharing more details on the strategy and product roadmap that we believe will get us there, and the specific ideas that we’d love your feedback on. In the meantime, we encourage you to join our Slack community of over 1800 data professionals to be part of the conversation and be the first to learn about what we’re up to. Also, be sure to check out our job openings and values in case you’d like to work on these problems full-time!