Why the Stigma Exists (And Why It’s Wrong)
The resistance to spreadsheets in data infrastructure comes from a real place. Most data teams have inherited horrifying Excel files: macros held together with hope, formulas referencing other workbooks across three different shared drives, and “final_v2_ACTUAL_final.xlsx” naming conventions that haunt your dreams.
But conflating “spreadsheets as a single source of truth for critical business logic” with “spreadsheets as a data source” is a category error.
The question isn’t whether spreadsheets belong in your data infrastructure. It’s what they should be used for, and how they should be integrated.
What Google Sheets Is Actually Good For
Not everything belongs in a database. Some datasets sit in an uncomfortable middle ground: too dynamic for hardcoded values, too simple for dedicated infrastructure, too business-context-heavy for engineering to own.
This is where Google Sheets excels.
Reference Data That Changes Frequently
Currency exchange rates are a perfect example. They need daily updates. They affect revenue reporting across regions. But building a dedicated service to fetch, validate, and load exchange rates from a third-party API adds complexity, cost, and another vendor dependency.
Resident Advisor faced exactly this challenge. They needed to update currency rates at least daily to report sales correctly from different regions in different countries. Rather than integrate yet another third-party service, they used Google Sheets. Finance updates the rates. Meltano syncs them. dbt models consume them. The reporting stays accurate without engineering involvement.
Simple. Reliable. Maintainable.
Mapping Tables That Live in Business Context
Product-to-general-ledger category mappings. Customer segment definitions. Campaign taxonomy. These datasets require business expertise to maintain correctly.
When you force these into engineering-owned systems, one of two things happens: either engineers become the bottleneck for every business logic change, or business users start requesting changes they don’t fully understand because they’ve lost direct control.
Google Sheets keeps ownership where it belongs. Business analysts maintain the mappings. The data warehouse consumes them. Everyone works in their area of expertise.
Parameters That Control Data Warehouse Behaviour
Feature flags. Processing thresholds. Date ranges for historical analysis. These aren’t “data” in the traditional sense, they’re configuration that controls how your warehouse processes data.
Storing them in Sheets means business users can adjust parameters without touching infrastructure, without creating tickets, and without waiting for the next sprint.
The boundary between “configuration” and “data” is blurrier than most teams acknowledge. Sheets lets you put parameters where they’re easiest to manage.
The Integration Pattern That Actually Works
The reason Google Sheets gets dismissed isn’t because it’s inherently problematic. It’s because most teams use it badly.
The anti-pattern looks like this: someone emails a CSV to an engineer, who manually uploads it to the warehouse, forgets to document it, and three months later no one remembers where the numbers came from or who owns them.
That’s not a Sheets problem. That’s a process problem.
The pattern that works treats Google Sheets as a first-class data source:
Automated syncing. Sheets connect directly to your data warehouse through proper ETL pipelines. Changes sync automatically on a schedule you control, hourly, daily, whatever the use case requires.
Version control through Sheet history. Google Sheets maintains its own change log. Every edit is tracked with user attribution and timestamps. You don’t need to recreate Git for spreadsheets.
Clear ownership. Each Sheet has a defined owner, usually the business analyst or domain expert who understands the data best. Engineering maintains the pipeline, not the content.
Proper monitoring. Schema changes, unexpected values, or sync failures trigger alerts just like any other data source. Sheets aren’t exempt from data quality standards.
dbt models as the interface. Downstream consumers never query Sheets directly. They query dbt models that reference Sheets tables, which means you maintain the same transformation logic, testing, and documentation as any other source.
This isn’t theoretical. This is how Resident Advisor runs their production data stack. Google Sheets sits alongside their event streaming and transactional databases, synced through Meltano, consumed by dbt models, powering dashboards that drive business decisions.
When NOT to Use Google Sheets
Let’s be clear: Google Sheets aren’t appropriate for everything.
Don’t use Sheets for high-volume transactional data. If you’re processing thousands of rows per minute, you need a proper streaming or database solution.
Don’t use Sheets as your system of record for critical business entities. Customer data, product catalogues, and financial transactions belong in systems designed for data integrity and audit trails.
Don’t use Sheets when you need complex access control. If your data requires role-based permissions at the row or column level, you need a proper database with authentication and authorisation.
Don’t use Sheets when the dataset will definitely grow beyond spreadsheet scale. If you’re starting with 1,000 rows but expect 1 million next year, start with the right architecture from the beginning.
The test is simple: if you’re constantly fighting the tool’s limitations, you’re using the wrong tool.
But if your dataset is small, changes frequently, requires business user input, and doesn’t fit neatly into existing systems, Sheets might be exactly right.
The Real Cost of Overengineering
Here’s what happens when teams reject Google Sheets on principle:
Engineers build custom admin interfaces for what should be simple data entry. A two-day spreadsheet solution becomes a two-week development project, plus ongoing maintenance.
Business analysts create Jira tickets to update mapping tables, then wait three sprints for a five-minute schema change. The lag between “we need to add a new category” and “the category is available in reporting” stretches from minutes to weeks.
Teams integrate expensive third-party services for functionality they could handle with a spreadsheet. Currency conversion APIs. Taxonomy management platforms. Configuration services that cost thousands per year to do what Sheets does for free.
The irony is that overengineering often creates more technical debt than the “quick fix” would have.
A properly integrated Google Sheet with clear ownership, automated syncing, and dbt models on top is more maintainable than a hastily built custom interface that no one remembers how to modify.
What “Production-Grade” Actually Means
When data teams say Google Sheets “isn’t production-grade,” what they usually mean is that the way most people use Sheets isn’t production-grade.
Manual uploads aren’t production-grade. Emailing CSVs around isn’t production-grade. Mystery spreadsheets with unknown owners aren’t production-grade.
But automated syncing is production-grade. Clear ownership is production-grade. Monitoring and alerting is production-grade. Version history is production-grade.
The tool matters less than the process around it.
Resident Advisor’s Google Sheets integration is more production-grade than many teams’ Snowflake implementations. Why? Because it’s automated, monitored, owned, and properly integrated into their transformation layer.
“Production-grade” isn’t about the technology you use. It’s about the reliability, observability, and maintainability of how you use it.
The Architecture That Makes This Work
If you’re wondering how to actually implement this, here’s the technical pattern:
Your Google Sheets live in your organisation’s Google Workspace, with proper access controls and defined owners.
Meltano’s Google Sheets connector syncs specified Sheets into your data warehouse on whatever schedule you define. Daily for currency rates. Hourly for feature flags. Weekly for less time-sensitive mappings.
The connector handles schema detection automatically. If you add a column to your Sheet, the warehouse table updates to match. If you rename a column, downstream dbt models will catch the breaking change through testing.
dbt models sit on top of the raw Sheets data, providing the same transformation, testing, and documentation layer as any other source. Downstream users query the dbt models, never the raw Sheets tables directly.
Your orchestration layer (whether that’s Meltano’s built-in scheduler, dbt Cloud, or Airflow) handles dependencies. If your revenue reporting model depends on yesterday’s currency rates, the sync runs first.
Monitoring catches issues before they become problems. Schema changes trigger alerts. Unexpected null values fail data quality tests. Sync failures page whoever’s on call.
This isn’t complicated. It’s just treating Sheets with the same engineering discipline you apply to any data source.
Why This Matters Now
The modern data stack promised to make data teams more efficient. But for many organisations, efficiency gains got eaten by complexity costs.
Teams now manage dozens of tools, each with its own interface, authentication, and billing model. Every new data source becomes a vendor evaluation, a procurement process, a security review, and ongoing maintenance overhead.
Meanwhile, Google Sheets is already approved, already understood, already being used across the organisation. The marginal cost of integrating it properly is tiny compared to adding another vendor.
This is particularly true for mid-sized organisations. If you’re spending £250k to £1M per year on data tools, every unnecessary service is a budget that could go toward headcount, better tooling, or cost savings you can show to leadership.
The best data teams aren’t the ones with the most sophisticated architecture. They’re the ones that solve business problems efficiently with appropriate tools.
Sometimes that’s Kafka and Flink. Sometimes it’s Google Sheets and dbt.
Where Most Teams Get This Wrong
The failure mode isn’t using Google Sheets. It’s using Google Sheets without treating it as infrastructure.
No clear ownership means no one knows who to ask when values look wrong. No automated syncing means someone has to remember to manually update the warehouse. No monitoring means broken pipelines go unnoticed until a dashboard breaks. No dbt layer means business logic gets embedded directly in BI tools where it’s impossible to test or version control.
Technology isn’t the problem. The process around it is.
If you’re going to use Google Sheets in your data stack, and you probably should, integrate it properly. Own it. Monitor it. Document it. Test it.
Treat it like production infrastructure, because that’s what it is.
FAQs
Doesn’t using Google Sheets create version control problems?
Google Sheets maintains a complete change history with user attribution and timestamps. Every edit is tracked automatically. You don’t need to recreate Git for spreadsheets, the version control is already built in. The real version control problem happens when people email CSVs around without any tracking at all.
What happens if someone accidentally deletes critical data from a Sheet?
Google Sheets stores complete revision history. You can restore any previous version with a few clicks. Additionally, your ETL pipeline typically maintains historical snapshots in your warehouse, so even if something goes wrong in the Sheet, your warehouse has the previous state.
How do we prevent unauthorised changes to production Sheets?
Google Sheets has proper access controls. You can set Sheets to read-only for most users, with edit access limited to designated owners. You can also enable “suggestion mode” where changes require approval before taking effect. Combine this with monitoring in your data pipeline and you have robust change management.
Isn’t this just creating spreadsheet hell in our data warehouse?
Only if you do it badly. The key is treating Sheets as a first-class data source with proper integration, not as ad hoc files that bypass your normal processes. Automated syncing, clear ownership, dbt models on top, and proper monitoring make Sheets just another managed data source.
What about performance when Sheets get large?
Google Sheets is appropriate for datasets up to roughly 10,000 rows. Beyond that, performance degrades and you should migrate to a proper database. But for reference data, mapping tables, and parameters, the use cases where Sheets excels, you rarely exceed this limit. If you do, that’s your signal to graduate to different tooling.
Final Takeaway
The data infrastructure you build should serve your organisation, not impress other data engineers.
Google Sheets isn’t technical debt when it’s integrated properly. It’s pragmatic architecture that puts data maintenance in the hands of people who understand it best.
Resident Advisor didn’t eliminate spreadsheets when they rebuilt their data stack with Meltano. They integrated them properly, automated, monitored, and treated as production infrastructure alongside their databases and event streams.
That’s not a workaround. It’s a sophisticated system design.
The best data teams don’t avoid simple tools. They use the right tool for each job, integrated properly, with appropriate engineering discipline applied regardless of the technology.
Sometimes the right tool is a distributed message queue. Sometimes it’s a spreadsheet.
Additional Resources
- Matatika’s Google Sheets Connector – Technical details on how we handle Google Sheets data
- Resident Advisor Case Study – See the full transformation story
- Learn more about Google Sheets – Google’s cloud-based spreadsheet platform
Ready to Stop Manually Uploading Spreadsheets?
Book a 30-minute discovery call. We’ll help you assess whether Google Sheets automation fits your use case and show you exactly how it would work for your data.
Next in this series: We’ll show you exactly how to turn ad hoc Sheet uploads into production-grade pipelines, the pattern Resident Advisor used for their daily currency rate updates.