Loading web browser cookies

Sometimes when scraping a website I need my script to login in order to access the data of interest. Usually reverse engineering the login form is straightforward, however some websites makes this difficult. For example if login requires passing a CAPTCHA. Or if the website only allows one simultaneous login session per account. For difficult cases such as these I have an alternative solution - manually login to the website of interest in a web browser and then have my script load and reuse the login session.

I have now packaged this solution as an open source python module. Here is some example usage:

>>> from webscraping import common, xpath
>>> import requests
>>> import browser_cookie
>>> cj = browser_cookie.load()
>>> r = requests.get('https://bitbucket.org/', cookies=cj)
>>> common.normalize(xpath.get(r.content, '//title'))
'richardpenman / home — Bitbucket'

If you have a bitbucket account and are logged in in a supported browser then you should see your account name printed here. Currently Firefox (Linux/OSX/Windows) and Chrome (Linux/OSX) are supported and I will add more platforms if get the chance to test.

blog comments powered by Disqus