Earlier this week, it was my turn to host a GitLab Group Conversation (a publicly live streamed Q&A on the GitLab Unfiltered YouTube channel) on Meltano.
Just like last time, I used the opportunity to provide a recap of the progress that’s been made since the last update 6 weeks ago, and to share our priorities for the upcoming weeks.
If you’re curious, check out the presentation on Google Slides and the Q&A on YouTube. The presentation content is also reproduced below, as is an embedded video of the Q&A!
Group Conversation Presentation
6 releases since the last GC (2020-07-21)
- V1.41.1 lets you easily override Singer stream schema descriptions for specific attributes (e.g. {“type”: [“string”, “null”], “format”: “date-time”}) using a new schema extractor extra.
- V1.42.0 makes meltano elt and its state management more resilient to failure and interruption.
- V1.43.0 improves the output of meltano elt in three significant ways.
- V1.44.0 fixes a bug that caused meltano elt to fail when an extractor (Singer tap) emits a RECORD message larger than 64KB in size.
- V1.45.0 adds a new Loaders tab to Meltano UI that lets you easily manage and configure your project’s loaders, as an alternative to using the meltano discover, meltano add, and meltano config CLI commands.
- V1.46.0 brings out-of-the-box support for extractor tap-spreadsheets-anywhere (docs), which lets you pull data from CSV and Excel files stored locally or in the cloud (S3, SFTP, HTTP, etc).
A lot of work was also done on reorganizing the documentation, which brought us:
- A brand-new Getting Started guide
- Detailed concept guides on Projects and Plugins
- A more clearly organized docs sidebar with dedicated guides for topics of interest
- A less content-heavy homepage, with more links to these dedicated guides
16 community contributions since the last GC (2020-07-21)
Done
- List installed and available extractors separately on Extractors page by Dmitry Stadnik (RFA)
- Update Docker file for faster builds by Paul Blankley
- Add Jobs resource to tap-gitlab by Charles Julian Knight
- Add Loaders page to UI by Dmitry Stadnik (RFA)
- Properly handle columns with reserved names in target-snowflake by Eric Simmerman
- Add oauth_credentials to tap-google-analytics settings in discovery.yml by Zachary Wynegar
- Use `singer-io/target-csv` instead of our own fork by Zachary Wynegar
- Add tap-spreadsheets-anywhere by Eric Simmerman
In review
- Create a JSON schema for `meltano.yml` and publish it on schemastore.org by Zachary Wynegar
- Improve docker-compose file template by Nevin Morgan
In development
- Allow system database schema to be overridden when PostgreSQL is used by Charles Julian Knight
- Upgrade `facebook_business` to 8.0.0 in tap-facebook by Zachary Wynegar
- Add support for `<ENV>_FILE` config env vars to support Docker Secrets by Zachary Wynegar
- Add Google Cloud Composer tutorial by Paul Tiplady
- Override auth check when using a shared embed link by Allan Whatmough
- Allow extractor entities to be selected in UI by Dmitry Stadnik (RFA)
Upcoming milestone priorities
Features
- Option for extracting only a specific stream
- Have `meltano add` print license and author/copyright details
Bugs
- Append schema properties to catalog
- Autogenerated job_id are not same in system database and folder structure
- Target-snowflake fails when a `schema` is provided that’s not all-uppercase
Documentation
Epics for upcoming priorities
- Improved dbt integration
- Expand library of known extractors and loaders
- ETL using Python scripts or YAML-defined rules
- Plugin Configuration Profiles
(Exact order to be determined)