I often find businesses hide their contact details behind layers of navigation. I guess they want to cut down their support costs.
This wastes my time so I use this snippet to automate extracting the available emails:
import sys
from webscraping import common, download
def get_emails(website, max_depth):
"""Returns a list of emails found at this website
max_depth is how deep to follow links
"""
D = download.Download()
return D.get_emails(website, max_depth=max_depth)
if __name__ == '__main__':
try:
website = sys.argv[1]
max_depth = int(sys.argv[2])
except:
print 'Usage: %s <URL> <max depth>' % sys.argv[0]
else:
print get_emails(website, max_depth)
Example use:
>>> get_emails('http://webscraping.com', 1)
['contact@webscraping.com']