Today is my first official day working on the Meltano team! In early February 2021 I announced on Twitter that I would be leaving the GitLab Data Team, where I’ve worked for the past 3 years, and joining Meltano. There was a short thread that accompanied the announcement which I promised to expand upon. I want to dive deeper on a few points in that thread and share more why I’m excited about the project and role.
In May of 2020 Douwe shared the new focus of Meltano: make the power of data integration available to all by turning Meltano into a true open source alternative to existing proprietary hosted ELT solutions. He directly cited my comment from 2018 where I indicated this was a potential avenue Meltano could take to build a real community. (Let’s ignore the part where I said we should create our own Meltano spec! I’m now fully aligned with our embrace of the Singer Spec.)
This renewed focus on being an excellent way to run open source extractors and loaders is part of why I’m so excited. Much of what makes communities like dbt and Airflow great is that they are open source tools that people are using to get real work done. For dbt in particular, the ability to network within the community, achieve real results, and contribute back to the project are part of why it has seen such success.
Meltano can be an excellent tool within the data space for many years to come and I want to share why I believe that.
Open Source, Always
GitLab is an open core company that has a free open source (FOSS) version of its product and is committed to having it be free forever (see the stewardship section of the handbook for more information). This model has served it well and is something we will emulate with Meltano. As Douwe stated in the part 2 of the pivot blog posts: “there will always be a Community Edition, and data integration will forever be a commodity rather than pay to play.” This is a critical aspect of our mission and strategy which makes it an exciting place to work.
I also believe that open source is essential to actually achieving our vision since one company can’t possibly manage and maintain the entire world of extractor and loader combinations. This reality leads to the next reason why I’m excited about the future of Meltano.
Facing the Challenge
Meltano is an exciting project too because it’s tackling a hard problem. As I said in a tweet a while ago, much of data engineering is digital plumbing – it’s critical work that often goes unnoticed and underappreciated until something breaks. It seems like it shouldn’t be as difficult as it is, but there is a long tail of data sources and data problems that are all slightly unique.
Solving hard problems that not a lot of people want to touch has been a through-line in my career and it’s something that excites me. The process can be arduous but the feeling of accomplishment from crafting great solutions to challenging problems that help real people is immensely fulfilling. And as Meltano grows I think the scope of data problems it solves can continue to grow as well (keep in mind Meltano does a lot more out of the box than just extraction and loading even if we aren’t focused on that from a development perspective right now!).
But if we had to solve these problems alone then I don’t think I would be as excited. We can’t meet the needs of everyone if everything is closed source and internal only. Open source is part of the solution to the challenge, but so is collaborating with a large network of data professionals.
Building a Community
The data community has been wonderful for me personally and professionally. I’ve met and connected with so many wonderful people via the dbt and Locally Optimistic communities (and Twitter!) that I never would have known had I just focused on my local Nashville community. It’s also enabled me to share my work from the GitLab Data Team in a transparent way that has helped numerous people in the community. There’s no better feeling than being able to answer someone’s question in Slack with the answer “here’s how we did it and here’s a link to our code – let me know if you have more questions!”.
I want Meltano to build on these communities with a focus on open source data integration. We see the project as working very well with dbt and other open source tools and I know it can be a great place for data engineers and other data professionals to collaborate on solving real problems. And it will help address that long (infinite) tail of data integration challenges by crowdsourcing the efforts via open source software.
I also share Douwe’s belief that Meltano can make world-class data engineering more inclusive of a broader audience. The “standard” data stack of Fivetran, Airflow, Snowflake, Census, and Looker can be expensive and much of it is pay to play. Much like how GitLab enables people to have a world-class DevOps experience with a free product, Meltano should do the same for DataOps.
Elevation of the Data Profession
I recently gave a talk with the inimitable Emilie Schario at Coalesce 2020. The gist of it was that Data Teams should view themselves as building a Data Product that serves their entire organization. This view is born out of the deep belief that most organizations aren’t realizing the full value of their data and that to close that gap a rethinking of how teams work is needed.
I see Meltano as being able to help enable this vision. The data profession and lifecycle tool chain is still in its early days with companies and startups everywhere getting early funding (dbt, Census, Airbyte, Hightouch, Fivetran, etc.). Many of these companies are addressing real pain points in the modern tool chain and will probably be successful businesses. But a proliferation of closed-source point solutions won’t enable the next level of data awareness that is required to close the data utility gap. None of these tools have the community or broad vision that I believe Meltano can and will have.
I want Meltano to follow the GitLab model: be truly excellent at a core piece of technology and then broaden to include more of the workflow in a way that makes sense for the community (and eventually customers). Meltano will be excellent at running Singer extractors and loaders while also enabling people to easily build their own (see our SDK effort). This will be the core of what Meltano does well.
The next level though is having a meta understanding of how data is flowing throughout your entire Data Product. It’s great that you can connect Fivetran to Snowflake and Snowflake to Census (this blog post is a compelling discussion on how they close the loop). But there is a not-so-hidden challenge in that diagram: it is difficult to have a holistic understanding of what’s happening throughout the stack. A single, metadata-focused view of how data is flowing requires a lot of data engineering work not pictured. While Meltano doesn’t solve this now, we’re building the tool in such a way that we can address this challenge better without requiring you to check multiple tools or manage the metadata yourself.
While good tools don’t solve all of your organizational challenges with data, I believe great tools can go a long way towards improving the current state of data operations.
Looking to the Future
I believe the next 10 years of the data profession has so much to offer and I want to be a part of building the tooling for that journey. Meltano can be a central player in the data community, helping people build their careers and get real work done. GitLab has shown the power of building a true community and enabling collaboration on a scale never before seen. It’s also built an amazing business that enables it to support the open source project and drive digital transformation in thousands of organizations.
Meltano will do the same, but first we want to focus on building a strong community. A strong community means:
- It’s easy to get up and running with Meltano to get real work done
- There are real people working on the project who are kind and helpful (Douwe, myself, and 1 more soon!)
- There are tools that exist to make it easy to contribute to the project and to build new taps and targets (see our Singer SDK effort)
We’re making great progress on these fronts and will continue to expand our efforts as the community grows.
There are also some additional questions that I’m excited to be thinking about too:
- How do we enable the metadata-first view on Meltano so that data about your data flow is easy to use? (See this issue if you have thoughts!)
- How can we help build trust in community taps and targets with an open testing and validation framework, with the goal of having a central place to learn about the behavior, supported features, and maintenance status of all taps and targets in the ecosystem?
- How do we build Meltano as a business in a way that’s a win-win for everyone?
And more! I don’t have the full answers and we need the community to help us answer these questions. I’m excited to continue bringing the GitLab values to the project and build upon the great foundation that Douwe and many others have started.