Demo Day Recap: 2021-05-07

Starting the month with a fantastic Demo Day, does it get any better than that?? We think not! If you missed it, the recording of the livestream is available on the Meltano YouTube channel.

Taylor started the session by demoing our brand new MeltanoHub which we soft-launched today. There was a lot of surface area to cover and we highly recommend you check it out yourself. Let us know if you have ideas on ways to improve or you want to contribute!

Next we had community member Nick Hamlin from GlobalGiving share how they’ve been using dbt exposures to increase their confidence in downstream reports within Metabase.

Following Nick, we had Derek Visch from Auto IDM share the work he’s been doing on target-mssql and in getting Meltano running well on Windows. Join him in the #windows and #singer-targets channels if you want to collaborate on either of those with him!

After him we had AJ demo the latest SDK for Singer Taps release by walking us through his new version of tap-github and connecting that to output to target-jsonl. This work will be utilized to support feeding data into the MeltanoHub, so we’re quite excited by this progress!

Lastly, we had Edward Smith share some of the improvements he’s been working on to enable tap-postgres to use timestamp-based incremental loads.

Overall, this was a great Demo Day and we look forward to hosting many more of these. We’re always listening to feedback and looking for ways to improve, so please contribute to any issue within the main project or chat with us on Slack if that’s more your style. Thank you to everyone who joined the livestream!

Join the conversation!

Our next Meltano Demo Day is on Friday, May 21 and we have our next office hours session on Wednesday, May 12. We hope to see you there!

Office Hours Recap: 2021-05-05

Today’s office hours was jam packed with content! We reviewed the latest preview of the Singer Target SDK, which is similar to the Tap SDK but for targets, and which will allow developers to write a Singer-compatible ELT destination without having to reimplement the full Singer spec by hand. We also demoed and discussed the new and improved Tap cookiecutter templates, and we presented Meltano’s plans to add Infrastructure-as-Code support with a new Meltano Terraform module.

Click through to YouTube description for topics and timestamps.

As always, we had a number of other great questions and topics raised by the community. Check the YouTube video description for specific topics and timestamps!

As a friendly reminder, we have Office Hours every Wednesday and Demo Day this Friday! Find us on Slack and join the conversation.

Hope to see you soon!

AJ

Singer SDK v0.1.4 is now available!

We are excited to announce the latest version of the Singer SDK. With this release, we’re thrilled to add automatic support for stream, field, and schema selection as well as a brand new set of cookiecutter templates for new projects.

Here’s the full list of new features in this release:

  • Added selection rules support for record and schema messages (#7, !26)
  • Added support for GraphQL query variables (#115, !78)
  • Improved cookiecutter template coverage, resolved readability issues. (#116, #119, !75)

This release also includes two fixes:

  • Resolved tap failure when a stream is missing from the input catalog. (#105, !80)
  • Resolved bug where unsorted streams did not properly advance state bookmarks for incremental streams. (#118, !74)

Stream selection as a built-in feature

This release adds the powerful stream selection feature, which allows a tap user to filter down the columns and tables which should be sent to the target by setting selection logic inside their catalog file. With this feature, tap developers do not need to parse the catalog or perform any column filtering logic themselves. Column selection logic for the records themselves, as well as for the SCHEMA messages, are all handled by the SDK automatically.

If you’ve already built a tap with the SDK, simply run poetry update singer-sdk to import the latest SDK version for your tap.

Join the conversation

For more information or to join the conversation, find us in the #singer-sdk channel on Slack!

Now Available: Meltano v1.73.0

Today, we are excited to release Meltano version 1.73.0, which:

Excited to try it out?

To upgrade Meltano and your Meltano project to the latest version, navigate to your project directory, activate the appropriate virtual environment, and run meltano upgrade. This will upgrade the meltano package and apply any necessary changes to your project.

What else is new?

The list below (copied from the changelog) covers all of the changes made to Meltano since the release of v1.72.0 on April 23:

New

  • #2621 Add twilio-labs variant of tap-zendesk

Changes

  • #2705 Speed up meltano install by installing plugins in parallel
  • #2709 Add support for setting kind in settings prompt when using meltano add --custom

Speedrun: from 0 to ELT in 90 seconds

Open source data integration has never been easier, or faster: with Meltano, extracting data from GitLab and loading it into PostgreSQL (or Snowflake, BigQuery, Redshift…) takes just 90 seconds from initializing a new Meltano project to viewing the loaded data:

Resulting Meltano project repository: https://gitlab.com/meltano/speedrun

Tools

More sources: https://meltano.com/plugins/extractors/

More destinations: https://meltano.com/plugins/loaders/

Prerequisites

  1. Install Meltano: https://meltano.com/docs/getting-started.html#install-meltano
  2. Set up PostgreSQL locally and create a database named `speedrun`

Commands

meltano init speedrun
cd speedrun

meltano add extractor tap-gitlab
meltano config tap-gitlab set projects meltano/meltano
meltano config tap-gitlab set start_date 2021-04-01T00:00:00Z
meltano select tap-gitlab tags

meltano add loader target-postgres
meltano config target-postgres set postgres_username [username]
meltano config target-postgres set postgres_database speedrun

meltano elt tap-gitlab target-postgres

psql -d speedrun
SELECT * FROM tap_gitlab.tags ORDER BY _sdc_received_at;

Singer SDK 0.1.3 is now available!

We’ve been hard at work building in new features in the SDK and responding to developer feedback. Version 0.1.3 is an important update which includes the following updates:

  • Added a new is_sorted stream property, which allows long-running incremental streams to be resumed if interrupted.
  • Added the signpost feature to prevent bookmarks from advancing beyond the point where all records have been streamed.
  • Added get_replication_key_signpost() stream method which defaults to the current time for timestamp-based replication keys.

And the following bug fixes:

  • Fixed a scenario where unsorted incremental streams would generate incorrect STATE bookmarks. — Thanks, @Egi!
  • Fixed a problem where CI pipelines would fail when run from a fork. — Thanks, @Derek Visch!
  • Fixed fatal error when running from the cookiecutter shell script

We also have a newly updated Meltano Tutorial: “How to Create a Custom Extractor” which walks you through how to use the SDK together with Meltano. If you have been curious about building with the SDK, this is a great place to get started.

For more information or to join the conversation, find us in the #singer-sdk channel on Slack!

Demo Day Recap: 2021-04-23

The last Demo Day of April had such energy and was incredibly fun! If you missed it, the recording of the livestream is available on the Meltano YouTube channel.

Taylor briefly shared some recent updates around the Community including our 1000 member milestone and enhancements of the Slack channels. He also detailed some of the efforts with the SingerHub including defining the spec and the live preview that is available.

We then had community member Reuben Frankel give a brief demo of Matatika after which he shared the two merge requests he’s been working on. The first is the new meltano remove command which will remove the installed plugin from the project completely. He also shared the other MR which will add the ability to specify the kind of a setting when adding a custom plugin. These are fantastic contributions and we really appreciate them!

Community member Derek Visch then shared his recent work on making a target for Microsoft SQL Server. As we work towards building out the SDK to support database targets, we’ll be making this an even easier task. He’s actively looking for collaborators on this target and we’re excited to work on creating useful abstractions for targets more generally.

AJ then shared details about the 0.1.3 release of the Singer SDK which adds the is_sorted stream property, adds the signpost feature and the get_replication_key_signpost() method, and fixes several bugs.

Lastly, Douwe walked through the 1.72.0 release of Meltano which added out-of-the-box support for Slack as a data source and Redshift as a destination. It also included a fantastic contribution from Charles Julian Knight that enables command shortcuts to be defined and invoked on a per plugin basis. Douwe gives an excellent demo in the video if you’re interested in this feature.

Overall, this was a great Demo Day and we look forward to hosting many more of these. We’re always listening to feedback and looking for ways to improve, so please contribute to any issue within the main project or chat with us on Slack if that’s more your style. Thank you to everyone who joined the livestream!

Join the conversation!

Our next Meltano Demo Day is on Friday, May 7 and we have our next office hours session on Wednesday, April 28. We hope to see you there!

Community Milestone: 1000 Slack Members

It’s official: the Meltano Slack community has reached 1000 members! This is a huge milestone for the community and shows us the excitement around open-source Data Integration and DataOps.

Photo by Pineapple Supply Co. on Unsplash

In May of 2020, Douwe’s blog post about how Meltano was pivoting to focus on open source data integration went live. There were about 250 people in the Slack group at that time. Since then, we’ve seen continuous growth in the community. Within Slack, we’ve seen well over a third of users are active on a weekly basis, surpassing the original Singer Slack space. We’ve also seen big growth in the usage of Meltano: by our latest estimate we have seen over 12000 projects created in the past 12 months of which 25% were used within the last month! And this is just the projects we know about since users can easily disable usage statistics.

Since January, we’ve seen even faster growth in usage and community activity as the investment in Meltano and Singer has grown. AJ and myself officially joined the team, we launched the Singer SDK, created a simplified interpretation of the Singer spec, and we announced our larger roadmap and vision, including the development of the upcoming SingerHub. On the community side, we started hosting weekly Office Hours and fortnightly Demo Days, which have been a lot of fun for everyone involved (recordings available on our YouTube channel). To keep up this fantastic energy and growth, we’re looking to hire more people soon in development and community roles (Interested? Reach out to Douwe on Slack!)

The SDK in particular has been very well received in the community with the launch being featured in Tristan Handy’s Data Science Roundup and an upcoming episode on the Data Engineering Podcast (stay tuned for more details and check out our previous episode!). More importantly, though, we’re hearing from the community that it’s been incredibly easy to build high quality taps. My favorite quote is from Stephen Bailey:

The SDK was able to trim my tap code down by about 70% from when I wrote it previously. This is everything I’ve wanted from Singer from the start.

This is a fantastic endorsement and a testament to AJ’s and the community’s hard work and collaboration.

We’re incredibly grateful for the support the Meltano and larger Singer community have shown us. We believe strongly in the potential of Singer and the open source DataOps ecosystem and we’re committed to helping it become what it should be. There is a lot of work still to do and we hope to grow the community and project as more data professionals understand the power of open source for data integration and DataOps.

We’d love for you to join us and help build this project and community! Collaborate with us on issues, in Slack, and on Twitter!

Now Available: Meltano v1.72.0

Today, we are excited to release Meltano version 1.72.0, which:

Excited to try it out?

To upgrade Meltano and your Meltano project to the latest version, navigate to your project directory, activate the appropriate virtual environment, and run meltano upgrade. This will upgrade the meltano package and apply any necessary changes to your project.

What else is new?

The list below (copied from the changelog) covers all of the changes made to Meltano since the release of v1.71.0 on March 23:

New

  • #2560 Add support for shortcut commands to invoke
  • #2560 Add support for sqlfluff utility for linting SQL transforms
  • #2613 Add mashey variant of tap-slack
  • #2689 Add documentation for using a custom Python Package Index (PyPi)
  • #2426 Add transferwise variant of target-redshift

Changes

  • #2082 Updated database_uri documentation to show how to target a PostgreSQL schema

Fixes

  • #2526 When target process fails before tap, report target output instead of BrokenPipeError or ConnectionResetError

Office Hours Recap: 2021-04-21

In office hours this week, we had a robust discussion on “SaaS Targets“, aka “Reverse ETL”, and specifically how developers can create non-traditional Singer targets which load data to applications and systems with strict data shape constrains. For instance, Salesforce and Google Directory are two targets we discussed in some detail.

To join future office hours session and keep up-to-date with weekly topics, join the #office-hours channel on Slack.

See you next week!

AJ