Your web browser will send what is known as a “User Agent” for every page you access. This is a string to tell the server what kind of device you are accessing the page with. Here are some common User Agent strings:
Blog
-
User agents
User-agent July 20, 2011
-
How to crawl websites without being blocked
User-agent Crawling Proxies February 08, 2010
Websites want users who will purchase their products and click on their advertising. They want to be crawled by search engines so their users can find them, however they don’t (generally) want to be crawled by others. One such company is Google, ironically.
Some websites will actively try to stop scrapers so here are some suggestions to help you crawl beneath their radar.