Taking advantage of mobile interfaces

Sometimes a website will have multiple versions: one for regular users with a modern browser, a HTML version for browsers that don’t support JavaScript, and a simplified version for mobile users.

For example Gmail has:

All three of these interfaces will display the content of your emails but use different layouts and features. The main entrance at gmail.com is well known for its use of AJAX to load content dynamically without refreshing the page. This leads to a better user experience but makes web automation or scraping harder.

On the other hand the static HTML interface has fewer features and is less efficient for users, but much easier to automate or scrape because all the content is available when the page loads.

So before scraping a website check for its HTML or mobile version, which when exist should be easier to scrape.

To find the HTML version try disabling JavaScript in your browser and see what happens.
To find the mobile version try adding the “m” subdomain (domain.com -> m.domain.com) or using a mobile user-agent.

blog comments powered by Disqus