How to automatically find contact details
Posted 06 Nov 2011 in IR, example, and python

I often find businesses hide their contact details behind layers of navigation. I guess they want to cut down their support costs.

This wastes my time so I use this snippet to automate extracting the available emails:

import sys  
from webscraping import common, download  
  
def get_emails(website, max_depth):  
    """Returns a list of emails found at this website  
  
max_depth is how deep to follow links  
"""  
    D = download.Download()  
    return D.get_emails(website, max_depth=max_depth)  
  
if __name__ == '__main__':  
    try:  
        website = sys.argv[1]  
        max_depth = int(sys.argv[2])  
    except:  
        print 'Usage: %s <URL> <max depth>' % sys.argv[0]  
    else:  
        print get_emails(website, max_depth)

Example use:

>>> get_emails('http://webscraping.com', 1)  
['contact@webscraping.com']

blog comments powered by Disqus