
BeautifulSoup
tap-beautifulsoup (matatika variant)
Python library for pulling data out of HTML and XML files.
Settings
Download Recursively
Attempt to download all pages recursively into the output directory prior to parsing files. Set this to False if you've previously run wget -r -A. Html https://sdk.meltano.com/en/latest/
Exclude tags
List of tags to exclude before extracting text content of the page.
Find All Kwargs
This dict contains all the kwargs that should be passed to the find_all call in order to extract text from the pages.
Flattening Enabled
'True' to enable schema flattening and automatically expand nested properties.
Flattening Max Depth
The max depth to flatten schemas.
Output Folder
The file path of where to write the intermediate downloaded HTML files to.
Parser
The BeautifulSoup parser to use.
Site URL
The site you'd like to scrape. The tap will download all pages recursively into the output directory prior to parsing files.
Source Name
The name of the source you're scraping. This will be used as the stream name.
Stream Map Config
User-defined config values to be used within map expressions.
Stream Maps
Config object for stream maps capability. For more information check out Stream Maps.
BeautifulSoup connector is available on Meltano. It is built, maintained, supported, and tested by Meltano.