“DataOps Evangelist“: It’s not a common role within most organizations—at least, not yet. A lot of people might even wonder what it is. Here’s how Meltano defined it when we put out the call for one a couple of months ago: “You’ll evangelize DataOps best practices to data teams around the world and advocate for new features or fixes that help these teams be successful.”
When we found Sven Balnojan, a passionate DataOps thought leader on a mission to champion data professionals everywhere, we quickly realized we’d found the person we needed. He concurred and joined our team in August 2022. He’s enthusiastic about engaging with our community, and you’ll be hearing a lot from him. We wanted to share his ideas and vision for the DataOps space as he expressed them during our pre-hire discussions.
Give us a “Sven 101” about where you’ve been and what you’ve done in your career.
Five years ago, I was finishing my Ph.D. thesis in singularity theory. I was working at a B2B marketing agency and realized I loved working with data more than anything. I started really digging into DataOps research, especially the various tools and workflows that are available. I realized that only a small subset of people actually used them and that knowing about them would simplify the DataOps process and make data teams’ lives easier and more productive.
So from there, I spent a couple of years on a data team as a data engineer, a DevOps engineer, and a machine-learning engineer. My next step up was to product management as a product owner of a data team.
How do you define DataOps?
There are a lot of different definitions out there, and I like a lot of them. Personally, my pragmatic definition of DataOps is a set of tools and practices we take from other branches to improve the speed and quality of our data output—like DevOps, lean manufacturing, and software engineering. We take those tools and principles into the data space to be more productive and deliver faster, more accurate data and faster and higher-quality code.
In theory, that sounds great. So, why doesn’t it translate to practice for many DataOps teams?
When I was focused on transferring the practices I mentioned from the software engineering world to the data engineering world, at first I was so excited to talk to other data engineers about tools they could use. (‘”Hey, use version code, use CI/CD,” and so on.) But then I realized that in the data world, there’s a huge difference in the day-to-day work that data engineers, data scientists, and machine-learning engineers do and what software engineers do.
Software engineers have only one pipeline to work on: they’re mainly focused on optimizing their code. So, when stuff breaks or doesn’t work, they have to fix their code quickly. It’s different for data engineers, who have two pipes: code and data. Two things can happen to them: either code breaks or data breaks.
The combination of data and code is unique to DataOps, and I don’t think we’ve put enough emphasis or focus on it. But the trend toward data observability is a step forward because it’s about the data pipeline—essentially trying to test the data while it’s flowing through this pipe. It’s a tall order and one reason data quality is often an issue for DataOps teams. Add that to the fact that many data teams spend tons of time going back and checking data when it seems “off”—at least half a day, in my experience, sometimes more.
Here’s just one common scenario I’ve encountered several times. A stakeholder comes to the data team and tells them the numbers in a dashboard look entirely wrong—that they looked completely different yesterday.
Now, the team must first dig into the code to see if a logic or programming change affected the data. Then, after they go down that road, they have to go down the data road. They typically have a lot of new data flowing into their reports every day, every hour, and now they have to check it, which takes a lot of time. Then, they have to combine the information they’ve uncovered about the code and the data to get an answer to the problem.
“I came to work for Meltano because I wanted to work with a company that was a pie maker, not a pie taker”.Sven Balnojan, DataOps Evangelist at Meltano
All that said, when the data and code pipes intersect, data teams get enormous value from the work they do, and so do their stakeholders! And that’s what I want to focus on in my work at Meltano.
That’s a worthy goal. Do you have a roadmap for achieving it?
I don’t have a list of specific topics yet, but for now, I’m focused on helping them answer three questions:
- How do I develop proper code as a data developer or engineer? How do I version, test, and deploy things?
- How do I observe and test my data? How do I trace it to see that it actually delivers the accurate results I need?
- As a team, how do we focus on delivering end-user value?
That last question is the most important one to answer, but I think it’s often neglected because, in the past, we didn’t have data teams—usually, it was just a unit that wasn’t created close to the product organization. So DataOps teams aren’t necessarily exposed to ways of working—like agile—with a product manager. But I’ve seen that change quite a bit in the last couple of years. Data engineers who have been in the business for five-to-seven years didn’t even have a product manager.
When DataOps teams can see themselves as producing products—in the form of valuable data—for end users, I believe the companies they work for will elevate their roles. And right now, we’re in a moment where the data is there, the technology is there, and we have a lot of educated data developers out there. The amount of data they have is massive, and every single company can start today to use its data in a really meaningful way.
I liken the way DataOps is evolving to how the software engineer role grew and how quickly it became product-focused. That same thing is happening in the DataOps world right now. If you trace the software engineering world’s history, you can see how quickly it became focused on CI/CD, which helped software engineers work much faster and be way more productive. And it was tools like CI/CD that came out years ago that enabled those productivity gains. And that’s what will happen in the DataOps space, too. Teams will get more great tools they can use every day.
We know why we chose you, but why did you choose Meltano?
I didn’t come to Meltano just because the company had an opening. I liked their vision. I’d read their vision posts about enabling the whole data lifecycle, which really clicked for me. I came to work for Meltano because I wanted to work with a company that was a pie maker, not a pie taker. And I think that’s Meltano’s unique position in the DataOps space—its mission to enable the data for every person and organization. I want to be a part of turning that vision into a reality.