As mentioned previously for the last year I studying a Masters at Oxford. It was a great year - the best of my life - and I only regret it was too short. I would recommend doing a 4 year Bachelor there to anyone. I will now be taking on new clients again and also working on some web scraping related side projects.
For the last 9 months or so I have been working intermittently on a book covering the web scraping skills I picked up over the years for my work. It is now available on Amazon or directly from the publisher. Or if feeling cheap can find on BitTorrent…
Coincidentally another book on web scraping with python was released at the same time by O'Reilly, available here.
This year I will be fulfilling a lifelong dream of studying at Oxford University. For the next 11 months I will be completing an MSc in Computer Science, and during this time will work on projects for existing clients but probably need to limit taking on new clients.
Google recently released the Arc Welder extension for Chrome, which allows an Android app to be run on the desktop. The aim of Arc Welder is to help make testing Android apps easier, but conveniently it also makes scraping Android apps easier too.
To run an app in Arc Welder we need to first download the app’s APK file. Google Play does not make this straightforward so I would suggest using an alternative such as androiddrawer.com. I often use the Exchange Rates app when travelling and was curious where they source their currency data. The APK is available for download here, and then when loaded in Arc Welder looks like this:
(Note that if you get an error saying WebGL is not supported then you need to force Chrome to support WebGL by enabling the Override software rendering list flag: chrome://flags/#ignore-gpu-blacklist)
Now that the Exchange Rates app is running on my computer I used Wireshark to track network traffic, which showed the following request was made:
This means that this app simply uses the Yahoo Finance API for their currency data, available at: http://finance.yahoo.com/webservice/v1/symbols/allcurrencies/quote?format=json. So to scrape the data from the Exchange rate app merely requires calling this API.
(If you want to learn more about how to use Wireshark check out their extensive documentation)
I chose a non-controversial app for this blog post that uses a public API, but this same technique can be applied to any Android app that loads its content from a backend server, which ought to be most apps with data of interest.
These days I am often contacted by businesses asking if I want to try a free trial of their service. A recent one was Luminati, which claimed to have access to millions of IP addresses. They weren’t willing to divulge much over email and their website had less information than it does now, so we set up a Skype call. My contact was a salesman so he wasn’t able to answer technical questions, but gave me a good overview of what they are trying to do. Apparently they are an Israeli startup that built a peer to peer network called Hola, where users install a plugin to access content that is blocked in their region by downloading via other peers in the network. Now that they had millions of users they wanted to monetize this network by reselling it as a proxy service. Great idea, though when I signed up for a test account with Hola this was not clear, so I doubt most users are aware their bandwidth is being resold.
Unfortunately I found Lumanati’s costs are prohibitive for my typical usage:
A medium scale website requires roughly 50GB of downloading, which would work out at $1000 if downloaded through Luminati, so using this service would only be practical for small websites that quickly block IP’s.
A few years ago I opened a US bank account so that US clients who wanted to pay by bank transfer could avoid needing to make an international transaction. This worked well until last month when clients started reporting their transfers were being rejected. I rang the bank (Chase) and after being transferred between a few departments was told I needed to come into a US branch with my passport to discuss the problem, which couldn’t be handled over the phone. Quite inconvenient because I don’t live in the US and didn’t plan to visit in the near future.
I expect the problem is receiving transactions from multiple client bank accounts raised some automated red flag, but won’t know for sure until next time visit the US. So now my US bank account was frozen without warning or explanation - as you can imagine I was not a happy camper.
This experience made me more sympathetic to the Bitcoin Libertarian philosophy where there is no centralized authority to get in the way of doing business. Consequently, I researched how the protocol works and added support for Bitcoin to my data store and invoice system. I originally budgeted a week of time to figure this all out but found the protocol much simpler than expected and in the end took just a weekend, certainly easier than an earlier integration I did with the PayPal API. The only complexity is that because transactions are anonymous I needed to generate a unique bitcoin address for each client so that I know who the transaction is from. This is how it works:
- Find the current exchange rate between the clients currency and Bitcoin
- Generate a bitcoin address to receive this transaction
- When expected transaction is received at this address mark as paid
That’s it. Blockchain.info has a well documented API that can handle each of these steps. For exchange rates there is the Ticker API and for managing and monitoring addresses in steps 2 & 3 there is the Receive API.
Also then to display a QR code for the required transaction I used the Google Chart API:
If you experience any problems with this support for Bitcoin or have suggestions to make it more intuitive, please get in touch.