Google recently released the Arc Welder extension for Chrome, which allows an Android app to be run on the desktop. The aim of Arc Welder is to help make testing Android apps easier, but conveniently it also makes scraping Android apps easier too.
To run an app in Arc Welder we need to first download the app's APK file. Google Play does not make this straightforward so I would suggest using an alternative such as androiddrawer.com. I often use the Exchange Rates app when travelling and was curious where they source their currency data. The APK is available for download here, and then when loaded in Arc Welder looks like this:
(Note that if you get an error saying WebGL is not supported then you need to force Chrome to support WebGL by enabling the Override software rendering list flag: chrome://flags/#ignore-gpu-blacklist)
Now that the Exchange Rates app is running on my computer I used Wireshark to track network traffic, which showed the following request was made:
This means that this app simply uses the Yahoo Finance API for their currency data, available at: http://finance.yahoo.com/webservice/v1/symbols/allcurrencies/quote?format=json. So to scrape the data from the Exchange rate app merely requires calling this API.
(If you want to learn more about how to use Wireshark check out their extensive documentation)
I chose a non-controversial app for this blog post that uses a public API, but this same technique can be applied to any Android app that loads its content from a backend server, which ought to be most apps with data of interest.
These days I am often contacted by businesses asking if I want to try a free trial of their service. A recent one was Luminati, which claimed to have access to millions of IP addresses. They weren't willing to divulge much over email and their website had less information than it does now, so we set up a Skype call. My contact was a salesman so he wasn't able to answer technical questions, but gave me a good overview of what they are trying to do. Apparently they are an Israeli startup that built a peer to peer network called Hola, where users install a plugin to access content that is blocked in their region by downloading via other peers in the network. Now that they had millions of users they wanted to monetize this network by reselling it as a proxy service. Great idea, though when I signed up for a test account with Hola this was not clear, so I doubt most users are aware their bandwidth is being resold.
Unfortunately I found Lumanati's costs are prohibitive for my typical usage:
A medium scale website requires roughly 50GB of downloading, which would work out at $1000 if downloaded through Luminati, so using this service would only be practical for small websites that quickly block IP's.
A few years ago I opened a US bank account so that US clients who wanted to pay by bank transfer could avoid needing to make an international transaction. This worked well until last month when clients started reporting their transfers were being rejected. I rang the bank (Chase) and after being transferred between a few departments was told I needed to come into a US branch with my passport to discuss the problem, which couldn't be handled over the phone. Quite inconvenient because I don't live in the US and didn't plan to visit in the near future.
I expect the problem is receiving transactions from multiple client bank accounts raised some automated red flag, but won't know for sure until next time visit the US. So now my US bank account was frozen without warning or explanation - as you can imagine I was not a happy camper.
This experience made me more sympathetic to the Bitcoin Libertarian philosophy where there is no centralized authority to get in the way of doing business. Consequently, I researched how the protocol works and added support for Bitcoin to my data store and invoice system. I originally budgeted a week of time to figure this all out but found the protocol much simpler than expected and in the end took just a weekend, certainly easier than an earlier integration I did with the PayPal API. The only complexity is that because transactions are anonymous I needed to generate a unique bitcoin address for each client so that I know who the transaction is from. This is how it works:
- Find the current exchange rate between the clients currency and Bitcoin
- Generate a bitcoin address to receive this transaction
- When expected transaction is received at this address mark as paid
That's it. Blockchain.info has a well documented API that can handle each of these steps. For exchange rates there is the Ticker API and for managing and monitoring addresses in steps 2 & 3 there is the Receive API.
Also then to display a QR code for the required transaction I used the Google Chart API:
If you experience any problems with this support for Bitcoin or have suggestions to make it more intuitive, please get in touch.
A significant update to the Android Apps database is now ready, which now contains over 2 million apps (2,130,732 to be exact). If you have purchased this database previously you can login to your account to download the updated version for free.
The latest version of the UPC database now contains over 7.5 million products, which is over a million more than the previous version. If you have purchased this database previously you can login to your account to download the updated version for free.
I searched my email and found over the last few years I received 76 messages from clients containing the text Web Scrapping rather than the usual spelling Web Scraping. And this is not unique to my clients - currently Google has 122,000 results for "Web Scrapping" compared to 447,000 results for "Web Scraping" - the correct spelling returns only 4x the number of results. So in light of this common spelling mistake I registered the domain webscrapping.com and redirected it here.