BeautifulSoup Connector

BeautifulSoup

tap-beautifulsoup (matatika variant)

Python library for pulling data out of HTML and XML files.

Settings

Download Recursively

Attempt to download all pages recursively into the output directory prior to parsing files. Set this to False if you've previously run wget -r -A. Html https://sdk.meltano.com/en/latest/

Exclude tags

List of tags to exclude before extracting text content of the page.

Find All Kwargs

This dict contains all the kwargs that should be passed to the find_all call in order to extract text from the pages.

Flattening Enabled

'True' to enable schema flattening and automatically expand nested properties.

Flattening Max Depth

The max depth to flatten schemas.

Output Folder

The file path of where to write the intermediate downloaded HTML files to.

Parser

The BeautifulSoup parser to use.

Site URL

The site you'd like to scrape. The tap will download all pages recursively into the output directory prior to parsing files.

Source Name

The name of the source you're scraping. This will be used as the stream name.

Stream Map Config

User-defined config values to be used within map expressions.

Stream Maps

Config object for stream maps capability. For more information check out Stream Maps.

Meltano Cloud Connector

BeautifulSoup connector is available on Meltano. It is built, maintained, supported, and tested by Meltano.

Get in touch Book a demo

Why Meltano?

Expert supportDirect access to the team that built and maintains Meltano Cloud. Same-day responses during UK business hours. When something breaks, we fix it fast because we know exactly how it works.

Rigorously testedEvery connector goes through comprehensive testing and quality checks before production. Daily monitoring catches issues before they hit your pipelines. We don't just wrap open-source taps and hope for the best. We validate, we test, we maintain.

No maintenance overheadAPI changes. Connector updates. Schema drift. Breaking changes from upstream sources. We handle it all. Your team focuses on using data. Our team focuses on making sure it's there when you need it.

Access to Meltano Slack communityJoin 5,500+ data engineers and analytics practitioners. The community is active, helpful, and always on. Good for quick questions, sharing patterns, and learning what others are building.

Resources

View source code