From REST API to Custom Singer Tap in 40 Minutes: Meltano Office Hours Recap

GitHub Actions Security: The pull_request_target Problem

Before the community spotlight, Ruben from the Meltano team kicked things off with a disclosure that every team using GitHub Actions should read carefully.

A security researcher reported a vulnerability in Meltano Hub’s CI/CD pipeline. The crux: Meltano Hub was using the pull_request_target workflow trigger in a way that allowed an attacker to perform command injection, extract the repository’s GitHub token, and (because Meltano Hub deploys on push to main with no branch protection) push malicious changes directly to the hub.

The Vulnerability Stack

This wasn’t a single flaw. It was a chain:

pull_request_target semantics: Unlike the standard pull_request trigger, pull_request_target always runs in the context of the base repository, even when triggered by a fork. That means it has access to secrets and the repo token regardless of who opened the PR. A common misconception is that requiring maintainer approval for external contributors mitigates this; it does not. The workflow still runs in the privileged context.

Unsafe use of workflow variables in shell commands: The pipeline was interpolating workflow variables (PR titles, branch names, or other user-controlled inputs) directly into shell commands. This is the classic injection vector. An attacker can craft a branch name or PR title that, when evaluated by the shell, runs arbitrary commands inside the CI runner.

No branch protection on main: Once the token was extracted, the attacker had write access to the repository. Combined with the auto-deploy-on-push setup, this meant arbitrary changes to Meltano Hub’s plugin registry could be deployed, including swapping out a plugin’s source URL for a malicious package.

What This Means in Practice

An attacker exploiting this chain could have replaced a trusted plugin’s source URL with a malicious PyPI package, injected a backdoor into a connector used by thousands of Meltano pipelines, and gone completely undetected, since the deployed plugin would look legitimate on the hub.

A security advisory has been filed and a CVE has been requested. If you find a vulnerability in Meltano, reach out to us on Slack

How to Protect Yourself

Stop using pull_request_target if you can. It’s rarely necessary. Most CI workflows that use it do so to access secrets for things like posting comments back to PRs; there are safer patterns for this using workflow_run instead.

If you must use pull_request_target, never interpolate any user-controlled data into shell commands. Keep user-provided values inside action inputs only, never inside run: blocks where the shell will evaluate them.

Use zizmor for static analysis. Zizmor is a purpose-built static analysis tool for GitHub Actions that catches dangerous patterns like these before they reach production. You can run it standalone, or point it at your GitHub token to scan your live workflow configurations. Meltano is already shipping the new SDK tap template with a zizmor check baked in, so if you scaffold a tap today, you’ll get this security gate for free.

Pin plugin versions to immutable references. Ruben raised the broader supply chain angle: pinning to a Git tag is not truly immutable, since tags can be force-pushed. Pinning to a PyPI release or a specific commit SHA is safer. Meltano itself is exploring a lock-file mechanism so that all connector versions in a project are pinned to stable, auditable references.


Community Spotlight: Building a Custom Singer Tap for SafetyCulture

The session’s main event: Harsh Smith, AI Data Engineer at XLR Infotec and Google Developer Group organizer in Rajkot, India, walked us through how he built a production data pipeline extracting SafetyCulture data into Snowflake, and the technical path he took to get there.

What is SafetyCulture?

SafetyCulture (formerly iAuditor) is an operations platform widely used in manufacturing and industrial environments. Think of it as a structured form-filling and audit system. A Harley-Davidson manufacturing unit, for example, might use it to track daily inspection checklists, log safety incidents, assign corrective actions to specific people, and record site-level issues. With 70,000+ organizations on the platform, there’s a massive amount of operational data sitting behind their REST API that teams want in their data warehouses.


The Problem Statement

The client needed their SafetyCulture data in Snowflake for analytical workloads. The core entities they cared about:

  • Audits / Inspections: The core form submissions
  • Actions: Assigned tasks with owners and due dates (e.g., “Fix machine guard by Friday”)
  • Issues: Flagged problems with status tracking
  • Users and Sites: Organizational metadata

The challenge: there’s no official Singer tap for SafetyCulture. You’re either building one or using a generic REST API tap and hoping it covers your edge cases.

Architecture Overview

The pipeline ran from the SafetyCulture REST API through a Singer tap, into a Postgres target for the demo (Snowflake in production). Orchestration was handled with a lightweight Python script. The meltano.yml configuration file was the central source of truth for the entire pipeline definition.

Phase 1: Prototyping with tap-rest-api-msdk

Harsh’s first pass used tap-rest-api-msdk, the community-built generic REST API tap. This is a solid prototyping tool: you configure it with your API’s base URL, auth scheme, and a list of streams defined in YAML. No custom Python required. Schema inference is automatic; records and columns are added dynamically as the tap reads responses from the API.

What you get for free with this approach: automatic schema inference, incremental loading via replication keys, and state management handled entirely by the framework. For many REST APIs, this is all you’ll ever need.

This worked well enough for the initial proof of concept. But as the requirements got more specific, the generic tap started showing its seams, particularly around the SafetyCulture server’s quirks and the need for more robust, client-specific error handling.


Phase 2: Building a Custom Singer Tap with the SDK

The production solution is a custom Singer tap built on the Meltano Singer SDK. Harsh couldn’t share the proprietary tap directly, but he replicated the key patterns in the demo.

The Singer SDK gives you a proper framework for defining streams, handling pagination, managing authentication, and emitting Singer-compliant messages. You define each stream with a schema, a primary key, and a replication key if you want incremental loading. The SDK handles the rest: writing bookmark values after each sync, generating a full discovery catalog, and emitting the SCHEMA, RECORD, and STATE messages that any Singer-compliant target (Snowflake, Postgres, BigQuery) can consume.

For this client, the streams on incremental load were audit search, actions, and issues, the high-volume, constantly-updated entities. Metadata streams like users and sites ran on full refresh.

The meltano.yml config ties the whole thing together: connector names, capabilities (state, catalog, discover), settings like the API token and start date, loader configuration, and a daily schedule. One file, the entire pipeline contract.


The AI-Assisted Development Workflow

Here’s the part that got the most discussion in the room. Harsh used Claude throughout the development process, and the team dug into exactly how.

The workflow went like this: Harsh started with tap-rest-api-msdk to prototype streams with basic YAML config (no Claude at this stage). Once the REST API tap hit its limits, he opened Claude with the SafetyCulture API docs and the existing meltano.yml as context. He then prompted Claude to help design a proper custom Singer tap using the SDK for the same endpoints. From there it was an iteration loop, refining stream definitions, state handling, and auth logic.

Time to a working solution: 30 to 40 minutes.

That’s not 30 minutes to a scaffolded stub; that’s 30 minutes from prompt to a running tap emitting valid Singer messages into a Postgres target.

Ruben’s framing resonated: Harsh knew something about Singer. He knew something about Meltano. But filling the gap between “I know the concepts” and “I have a running production connector” is exactly where the AI iteration loop shines.

A few real issues Harsh ran into during development that Claude helped debug quickly:

  • Unpinned setuptools on Windows: A known footgun when building Python packages on Windows; switching to Linux resolved it
  • IPv6/IPv4 conflicts with PostgreSQL: The local Postgres target was binding to IPv6 by default, causing connection failures until forced to IPv4

Both are the kind of environmental noise that’s easy to lose hours to. With Claude in the loop, the diagnostic cycle was much faster.


The Broader Question: Where Does AI Fit in the Meltano Developer Loop?

The Harsh demo sparked a wider conversation about how AI is changing the way people build with Meltano, and where the team should invest.

llms.txt and agent context files. There’s an emerging pattern of projects shipping an llms.txt or AGENTS.md at their documentation root: a machine-readable summary of how the project works, written specifically for LLM consumption rather than human readers. Edgar raised the idea of Meltano shipping a better project README during meltano init, structured so that an AI agent can orient itself in the project immediately. The command could generate this automatically and keep it opt-out.

Skill bundles for AI agents. The idea of a “Meltano skills bundle” was floated: a set of predefined instructions for building taps, targets, and configuring state that any agent could load. Harsh pointed to the “Everything Claude” repo in the Claude marketplace as an example of community-curated skill collections doing this well today.

The connector variant consolidation problem. Derek raised a long-standing pain point: Meltano Hub has multiple variants for many connectors, and when one doesn’t quite work, finding the right one is a research exercise. The new angle: an agent could theoretically scan every fork of a tap on GitHub, identify which ones have implemented specific features, and synthesize a superset variant, tested and license-filtered. The energy to do this manually was always prohibitive. With LLMs, it’s becoming plausible.


Release Notes

Connectors

tap-klaviyo: Fixed a pagination bug where the second page of campaign data was failing with a 400 Bad Request error due to malformed URL parameters. If you were only getting the first page of campaign results, update now.

target-mssql (Meltano Labs variant): Now supports ODBC as a connection driver in addition to pymssql. Benchmarking shows roughly 2x throughput improvement with the ODBC driver. If you’re loading into SQL Server, it’s worth switching.

tap-jira: Audit record stream pagination was broken, capping results at the first 1,000 records. Fixed. If your Jira instance has more than 1,000 audit events and you care about the full history, upgrade and run a full refresh.

tap-spreadsheets-anywhere 0.5.1: Fixed a memory exhaustion issue when loading large Azure Blob files into memory. Previously, the entire blob was buffered in-process; this is now streamed.

meltano-utilities-powerbi 1.1: Expanded Power BI support for pipeline-triggered dataset refreshes. More updates coming here as the team invests in the Power BI integration.

Singer SDK

singer-sdk 0.0.54 regression: If you updated to 0.0.54, there’s a bug in state partitioning key handling for child streams. When a stream uses state_partitioning_keys to reduce the keys stored in state (a common pattern for deeply nested child streams to avoid state bloat), the regression caused those keys to not be removed, meaning state payloads could grow unboundedly.

The impact: if you’re storing state in S3 or a Postgres-backed state store, payloads could balloon significantly. Even more critically, the oversized state message could exceed Meltano’s inter-process message buffer, breaking the tap→target pipe entirely.

Fix: upgrade to 0.0.54.2, which patches the regression. If you ran syncs on 0.0.54, inspect your state files for unexpected size growth.


Join the Next Office Hours

Meltano Office Hours runs every two weeks. If you’ve built something with Singer, Meltano, or have a war story about connector development, this is where to share it.

Community spotlights like Harsh’s are the best part of these sessions. If you’re interested in presenting, reach out on the Meltano Slack.

Intrigued?

You haven’t seen nothing yet!