WebScraping.com Logo


  • Luminati

    Business Proxies

    These days I am often contacted by businesses asking if I want to try a free trial of their service. A recent one was Luminati, which claimed to have access to millions of IP addresses. They weren’t willing to divulge much over email and their website had less information than it does now, so we set up a Skype call. My contact was a salesman so he wasn’t able to answer technical questions, but gave me a good overview of what they are trying to do. Apparently they are an Israeli startup that built a peer to peer network called Hola, where users install a plugin to access content that is blocked in their region by downloading via other peers in the network. Now that they had millions of users they wanted to monetize this network by reselling it as a proxy service. Great idea, though when I signed up for a test account with Hola this was not clear, so I doubt most users are aware their bandwidth is being resold.

  • Phone calls


    I have noticed that if for some reason a new client can not describe what they are after in an email, but want a quick phone call to clarify, this conversation will rarely develop into an actual project. I guess they want to pick my brain and then implement it internally or hire someone cheaper, so to limit the amount of time wasted I have started insisting an overview of the project be sent before setting up a phone call.

  • Bitcoin

    Business Website

    A few years ago I opened a US bank account so that US clients who wanted to pay by bank transfer could avoid needing to make an international transaction. This worked well until last month when clients started reporting their transfers were being rejected. I rang the bank (Chase) and after being transferred between a few departments was told I needed to come into a US branch with my passport to discuss the problem, which couldn’t be handled over the phone. Quite inconvenient because I don’t live in the US and didn’t plan to visit in the near future.

  • Web Scraping or Web Scrapping?

    Business Website

    I searched my email and found over the last few years I received 76 messages from clients containing the text Web Scrapping rather than the usual spelling Web Scraping. And this is not unique to my clients - currently Google has 122,000 results for “Web Scrapping” compared to 447,000 results for “Web Scraping” - the correct spelling returns only 4x the number of results. So in light of this common spelling mistake I registered the domain webscrapping.com and redirected it here.

  • The web services I use


    A few friends asked me what web services I use to run my business so I am writing this to point people in future.

  • Working onsite in NYC


    For the next year I am going to be working onsite at a web scraping focussed startup in New York. Looking forward to the experience! I found that working in the US is straightforward for Australian’s because of the fantastic E3 visa. I just took my job offer letter to the US consulate along with some documentation, paid a few hundred dollars, and within a fortnight I had a 2 year work visa that can be extended indefinitely.

  • Web Scraping User Interface

    Crawling Business Web2py

    When scraping a website, typically the majority of time is spent waiting for the data to download. So to be efficient I work on multiple scraping projects simultaneously.

  • Useful business directories


    Business web directories are a great source of data and scraping data from them is a common request from clients. Below are my list of directories that I know of from each country or region. I have noticed that directories for poorer countries often disappear, so let me know if a link no longer works.

  • Is Web Scraping legal?


    I am often asked whether web scraping is legal and I always respond the same - it depends what you do with the data.

  • Automated quote tool


    An ongoing problem for my web scraping work is how much to quote for a job. I prefer fixed fee to hourly rates so I need to consider the complexity upfront. My initial strategy was simply to quote low to ensure I got business and hopefully build up some regular clients.

    Through experience I found the following factors most effected the time required for a job:

  • Best website for finding freelance work

    Business Elance Freelancing

    When I started freelancing I created accounts on every freelance site I could find (oDesk, guru, scriptlance, etc) to get as much work as possible. However I found I got almost all work from just one source - Elance. How is Elance different?

    With most freelancing sites you create an account and immediately start bidding on jobs. There is no cost to bidding so people bid on many projects even if they don’t have the skill or time to complete it. This is obviously frustrating for clients who waste a lot of time sifting through bids.

    On the other hand Elance has a high barrier to entry: you have to pass a test to show you understand their system, then receive a phone call to confirm your identity, and when established pay money for each job you bid on. Often I see jobs on Elance with no bids because it requires obscure experience - people weren’t willing to waste their money bidding for a job they can’t do. This barrier serves to weed out less serious freelancers so that the average bid is of higher quality.

    From my experience the clients are different on Elance too. On most freelancing sites the client is trying to get the job done for the smallest amount of money possible and are often willing to spend their time sifting through dozens of proposals, hoping to get lucky. Elance seems to attract clients who consider their time valuable and are willing to pay a premium for good service.
    Often clients contact me directly through Elance because I am native English and want to avoid potential communication or cultural problems. One client even requested me to double my bid because “we are not cheap!”

    After a year of freelancing I now get the majority of work directly through my website, but still get a decent percentage of clients through Elance.

  • Fixed fee or hourly?


    I prefer to quote per project rather than per hour for my web scraping work because it:

  • Typical web scraping job

    Big picture Business

    In this post I will try to clarify what web scraping is all about by walking through a typical (though fictional) project.