The Site Scraper module
Posted 01 Mar 2011 in opensource and sitescraper

A few years ago I developed the sitescraper library for automatically scraping website data based on example cases:

>>> from sitescraper import sitescraper>>> ss = sitescraper()  
>>> url = '
>>> data = [" python", ["Learning Python, 3rd Edition",   
  "Programming in Python 3: A Complete Introduction to the Python Language",
  "Python in a Nutshell, Second Edition (In a Nutshell (O'Reilly))"]]  
>>> ss.add(url, data)  
>>> # we can add multiple example cases,
>>> # but this is a simple example so one will do (I generally use 3)  
>>> # ss.add(url2, data2)   
>>> ss.scrape('
[" linux", [
    "A Practical Guide to Linux(R) Commands, Editors, and Shell Programming", 
    "Linux Pocket Guide", 
    "Linux in a Nutshell (In a Nutshell (O'Reilly))", 
    'Practical Guide to Ubuntu Linux (Versions 8.10 and 8.04), A (2nd Edition)', 
    'Linux Bible, 2008 Edition'

See this paper for more info.

It was designed for scraping websites over time where their layout may change. Unfortunately I don't use it much these days because most of my projects are one-off scrapes.

blog comments powered by Disqus