Often I need to reverse engineer how an Android app loads its data. If I can determine the relevant server endpoints then I can call them directly in future via a script and bypass the app. This process is more complex when the app uses HTTPS for network communication, which is becoming more common now that Let’s Encrypt provides SSL certificates for free.
To work with secure network traffic I use mitmproxy, which is an open source interactive HTTPS proxy written in Python. Installation steps are documented here. Once installed you need to determine the local IP address assigned by a router to your computer - this is different than the external IP address seen from the internet. On Mac I use the following to determine the local IP:
Now start mitmproxy at this local IP and at a port of your choosing, such as 4444:
Next step is to setup your Android device to communicate with mitmproxy:
- Go to wifi settings and long press your connection
- Select Modify network
- Open advanced settings and change Proxy to Manual
- Set Proxy hostname and Proxy port to the details used in the earlier mitmproxy command
- Navigate the web browser to mitm.it and you should see a list of certificates to install - if not then the proxy settings are not working
- Download the Android certificate
If the proxy is working then you should start to see requests from the web browser in the mitmproxy console window:
The network activity for Android apps is also viewable. If we open the Exchange Rates app then the following network activity will be triggered, which loads the exchange rate data used by the app:
This is a HTTPS request but we can still observe the internals because of the custom certificate installed.
Note that this approach will only work up to Android Marshmallow - in Android Nougat the security model was changed so that user installed certificates are no longer trusted by default. Also apps that use certificate pinning will not work, since this forces the app to only use a hardcoded trusted certificate. In these case more advanced techniques would need to be employed such as modifying the APK or tampering with the app at runtime with Frida.
In a previous post I covered a way to monitor network activity in order to scrape the data from an Android application. Sometimes this appraoch will not work, for example if the data of interest is embedded within the app or perhaps the network traffic is encrypted. For these cases I use UIautomator, which is a Python wrapper to the Android testing framework.
To setup UIautomator you will need to install the Android SDK and set the
ANDROID_HOME environment variable.
On Mac I ran the following to install these dependencies:
To check whether your setup is working connect your Android device by USB and try the
info command to check its status:
If you get the exception EnvironmentError: Device not attached, then you need to enable USB debugging on your Android device in order to connect.
If the above command works then you are ready to start automating interactions with your Android device. Note that instead of a physical Android device you can use an emulator such as Genymotion, however I have found some popular apps will not run on emulators.
UIautomator provides a number of ways to select elements such as by text or class. Actions can then be performed on the selected elements such as click, drag, swipe, and keyboard input.
One of the most useful commands is
dump(), which returns the current XML layout of your Android device:
This is equivalent to viewing the source of a web page and shows the attributes used by each element, which can be used by selectors to identify the correct element.
Here is an example script with UIautomator:
The script will:
- connect to a device with the given serieal number (the shortcut
from uiautomator import deviceshould not be used if multiple devices are connected)
- navigate to the home screen
- open an app labelled Maps (in my case Google Maps)
- type NYC in the search bar
- and press the ENTER key to show results
UIautomator also supports many other features such as event listeners and relative selectors, which are well documented on the github page.
As mentioned previously for the last year I was studying a Masters at Oxford. It was a great year - the best of my life - and I only regret it was too short. I would recommend doing a 4 year Bachelor there to anyone. I will now be taking on new clients again and also working on some web scraping related side projects.
For the last 9 months or so I have been working intermittently on a book covering the web scraping skills I picked up over the years for my work. It is now available on Amazon or directly from the publisher. Or can be found on BitTorrent.
Coincidentally another book on web scraping with python was released at the same time by O'Reilly, available here.
This year I will be fulfilling a lifelong dream of studying at Oxford University. For the next 11 months I will be completing an MSc in Computer Science, and during this time will work on projects for existing clients but probably need to limit taking on new clients.
Google recently released the Arc Welder extension for Chrome, which allows an Android app to be run on the desktop. The aim of Arc Welder is to help make testing Android apps easier, but conveniently it also makes scraping Android apps easier too.
To run an app in Arc Welder we need to first download the app’s APK file. Google Play does not make this straightforward so I would suggest using an alternative such as androiddrawer.com. I often use the Exchange Rates app when travelling and was curious where they source their currency data. The APK is available for download here, and then when loaded in Arc Welder looks like this:
(Note that if you get an error saying WebGL is not supported then you need to force Chrome to support WebGL by enabling the Override software rendering list flag: chrome://flags/#ignore-gpu-blacklist)
Now that the Exchange Rates app is running on my computer I used Wireshark to track network traffic, which showed the following request was made:
This means that this app simply uses the Yahoo Finance API for their currency data, available at: http://finance.yahoo.com/webservice/v1/symbols/allcurrencies/quote?format=json. So to scrape the data from the Exchange rate app merely requires calling this API.
(If you want to learn more about how to use Wireshark check out their extensive documentation.)
I chose a non-controversial app for this blog post that uses a public API, but this same technique can be applied to any Android app that loads its content from a backend server, which ought to be most apps with data of interest.