Offline reverse geocode
Posted 01 Jun 2014 in efficiency, opensource, and python

I often use Google’s geocoding API to find details about a location like this:

>>> from webscraping import download
>>> D = download.Download()
>>> D.geocode('-37.81,144.96')
{'address': "127-141 A'Beckett Street",
 'country': 'Australia',
 'country_code': 'AU',
 'full_address': "127-141 A'Beckett Street, Melbourne VIC 3000, Australia",
 'lat': -37.810035,
 'lng': 144.959875,
 'number': '127-141',
 'postcode': '3000',
 'state': 'Victoria',
 'state_code': 'VIC',
 'street': "A'Beckett Street",
 'suburb': 'Melbourne'}

The drawback of this approach is the Google API limits each user to 2500 requsts per 24 hours. So if I want to geocode 1 million locations then I would need to rent a lot of proxies or else the API calls will take over a year to complete (1,000,000 / 2,500 = 400 days). To meet this use case I built a module to reverse geocode a latitude / longitude coordinate using a list of known locations from geonames.

Here is some example usage:

>>> import reverse_geocode
>>> coordinates = (-37.81, 144.96), (31.76, 35.21)
[{'city': 'Melbourne', 'country_code': 'AU', 'country': 'Australia'},
 {'city': 'Jerusalem', 'country_code': 'IL', 'country': 'Israel'}]

Internally the module uses a k-d tree to efficiently find the nearest neighbour of each given coordinate. On my netbook I find building the tree takes ~2.5 seconds and then each location query just ~1.5 ms.

The module is licensed under the LGPL on bitbucket:

blog comments powered by Disqus