Extract Article Content
python-readability is a Python port of arc90’s readability tool.
from readability import Document
= Document(content)
article
print(article.title())
Myles Braithwaite
December 15, 2017
A better web browser with Selenium and python-readability.
Using Selenium we can extract website content from a Javascript website.
with closing(Firefox()) as browser:
browser.get(URL)
WebDriverWait(browser, timeout=10).until(
lambda x: x.find_element_by_tag_name("body")
)
content = browser.page_source
SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
python-readability is a Python port of arc90’s readability tool.
Made by Myles Braithwaite with ❤️ in Toronto.