When crawling websites it can be useful to know what technology has been used to develop a website. For example with a ASP.net website I can expect the navigation to rely on POSTed data and sessions, which makes crawling more difficult. And for Blogspot websites I can expect the archive list to be in a certain location.
There is a useful Firefox / Chrome extension called Wappalyzer that will tell you what technology a website has been made with. However I needed this functionality available from the command line so converted the extension into a python script, now available on bitbucket.
Here is some example usage:
>>> import builtwith
>>> builtwith('http://webscraping.com')
{'Analytics': 'Google Analytics',
'Web server': 'Nginx',
'JavaScript framework': 'jQuery'}
>>> builtwith('http://wordpress.com')
{'Blog': 'WordPress',
'Analytics': 'Google Analytics',
'CMS': 'WordPress',
'Web server': 'Nginx',
'JavaScript framework': 'jQuery'}
>>> builtwith('http://microsoft.com')
{'JavaScript framework': 'Modernizr',
'Web framework': 'Microsoft ASP.NET'}