Better Web Browser

python
Author

Myles Braithwaite

Published

December 15, 2017

A better web browser with Selenium and python-readability.

URL = "https://theoutline.com/post/2604/star-wars-the-last-jedi-hype"

Download Website Content

Using Selenium we can extract website content from a Javascript website.

from contextlib import closing

from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
with closing(Firefox()) as browser:
    browser.get(URL)

    WebDriverWait(browser, timeout=10).until(
        lambda x: x.find_element_by_tag_name("body")
    )

    content = browser.page_source
SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line

Extract Article Content

python-readability is a Python port of arc90’s readability tool.

from readability import Document
article = Document(content)

print(article.title())

Display Article Content

from IPython.core.display import display, HTML
display(HTML(article.summary()))

Made by Myles Braithwaite with ❤️ in Toronto.