I am often asked whether I can extract data from a particular website.
And the answer is always yes - if the data is publicly available then it can be extracted. The majority of websites are straightforward to scrape, however some are more difficult and may not be practical to scrape if you have time or budget restrictions.
For example if the website restricts how many pages each IP address can access then it could take months to download the entire website. In that case I can use proxies to provide me multiple IP addresses and download the data faster, but this can get expensive if many proxies are required.
Another difficulty is if the website uses CAPTCHA’s or stores their data in images. Then I would need to try parsing the images with OCR or hiring people (with cheaper hourly costs) to manually interpret the images.
In summary I can always extract publicly available data from a website, but the time and cost required will vary.