The internet contains a huge amount of data but most is not in a useful format. Web scraping is the process of extracting this data from websites into a structured format such as a CSV spreadsheet so it can be reused.
Yes - if the data is publically available then it can be extracted, though it may not be practical for some websites. For example if the website heavily restricts IP addresses then scraping their data would require renting a lot of proxies, which may make the project too expensive.
Scraping data from public websites is very common and many businesses like Google depend on it. I find in practice that scraping the data is not a problem. Any potential problem depends on how you reuse the data. If the data is for private use and the web crawling does not impact other users, then should not be fine. I expand on this in this blog post
I have collected many useful web scraping resources on my blog, which is a good start. I also published a book on web scraping with Packt. And my bitbucket account hosts some open source web scraping scripts.
My name is Richard Penman and I am the founder of webscraping.com. I am originally from Australia but often travel alongside this business - have worked from over 50 countries so far. I have a B.E. from Melbourne University and an MSc in Computer Science from Oxford University. I have also trained a few others at web scraping who I collaborate with on larger projects.
These are the main factors that make a job more difficult, and therefore more expensive:
If the website is relatively small, well structured, and the data is embedded cleanly in the HTML then I would expect to quote ~$150 USD. Prices are discounted when ordering multiple website scrapes. Complete the automatic quote form to get an idea of cost.
I am not the cheapest because I am not the worst.
A simple website can be scraped within a few hours while a larger one will take several weeks to download all the required data. When I have received your project details I will give you an estimation of the time required.
No - that data will be just for you.
Just fill in the automatic quote form and I will look it over and get back to you within 1 business day.
These are the typical stages in each web scraping project:
Maybe not alone so I have trained some other people at web scraping and we collaborate on the bigger projects.
Certainly - I can provide a quote for regular updates of a scraped dataset and handle any required maintenance.
Yes. I use Python for most projects and can provide the code used. Note that if the script is used without proxies for a large website then it may be blocked.
For a custom website scrape I will quote a fixed fee for the job and if you are a new client then I will request a deposit of half upfront - this deposit will be refunded if I can not finish the project. Larger projects can be split into a number of milestones.
The invoice has payment options for PayPal, Credit Card, and Bank transfer.
If you are not comfortable with paying part up front to a random guy over the internet (which is understandable) then we can use Upwork, which supports an Escrow system to hold payments until job completion. (Note that to cover Upwork's fee this will cost 8.75% extra.)
Yes - this is still text and can be extracted just like English. I also use Google Translate to help me understand how the website works.