When crawling websites it can be useful to know what technology has been used to develop a website. For example with a ASP.net website I can expect the navigation to rely on POSTed data and sessions, which makes crawling more difficult. And for Blogspot websites I can expect the archive list to be in a certain location.

There is a useful Firefox / Chrome extension called Wappalyzer that will tell you what technology a website has been made with. However I needed this functionality available from the command line so converted the extension into a python script, now available on bitbucket.

Here is some example usage:

>>> import builtwith
>>> builtwith('http://webscraping.com')
{'Analytics': 'Google Analytics',
 'Web server': 'Nginx',
 'JavaScript framework': 'jQuery'}
>>> builtwith('http://wordpress.com')
{'Blog': 'WordPress',
 'Analytics': 'Google Analytics',
 'CMS': 'WordPress',
 'Web server': 'Nginx',
 'JavaScript framework': 'jQuery'}
>>> builtwith('http://microsoft.com')
 {'JavaScript framework': 'Modernizr',
  'Web framework': 'Microsoft ASP.NET'}