Monday, December 3, 2007

Geo Coding with Google KML

The google map API is easy to use either from javascript or directly through curl, or ruby. A recent post by assay depot on replacing or foregoing the heavy weight Cartographer plugin for a lighter approach has inspired me to produce a stand-alone ruby library to grab coordinates and address info using google, or any other service (geocoder.us comes to mind). So, here's my attempt.

First off is to get registered with google to get an access key. Second was to place this in a secure location so that it can be used, but not seen by others. So I created a simple google.rb script to house this and other login creds. It's stored in the home directory of what ever server I'm currently on, so easy to get to but out of view from the web space.

The next step was to create a simple value object that holds physical address information. I tried to make the object as generic as possible by cross referencing google KML (keyhole markeup language) schema and the commercial geocoder.us attributes as well as the TIGER/Line data provided by the US Census Bureau . Lots of attributes, but what I need is basic address and coordinates so the attributes are:
  • street (or thoroughfare)
  • city
  • state code
  • postal code
  • country
  • latitude
  • longitude
  • accuracy
Next was to create a query request object. The query request is used to launch queries to any geo source, parse the results and return a populated value object. A typical query would be:
include Geocode
source = GoogleMapSource.new(Google.access_key)
request = Query.new(source)

info = request.search('1211 13th St, Boulder')
The result value object (info) returned from the search would show:
street => 1211 13th St  
city =>
Boulder
state => CO
postal_code => 80302
country => USA
latitude => 40.008704 longitude => -105.276221
accuracy => 8
The search results from google is in XML format, so populating the address object with the correct values requires parsing the results. For this I created a KmlParser class. The KML schema supports multiple addresses, county, altitude, and other attributes. Here is the google output response to the previous query.

The QueryObject returns the first address found, but subsequent addresses are available (query.addresses). Also the status code for the query is available along with a boolean status_ok method, so a safer call would be:
request.search('1211 13th St, Boulder')
info = request.addresses.first if request.status_ok
You can download the classes including address info, parser, google access, etc here. (Note: You will have to provide your own access code to use.)

2 comments:

Doug Daniels said...

Nice work!

One thing--there's a bug in your XPath. The issue is that node.get_elements searches the entire document when passed an XPath beginning with // -- thus, you're getting only the first Placemark for your results most of the time. Also, when a given element name isn't defined for a Placemark, it's actually finding one in the next Placemark.

Would suggest changing your get method's selector to look like:

node.get_elements(node.xpath + "//" + tag).first

That should ensure that the search happens only underneath the node you pass in, while also allowing for KML's structure variations based on address type.

darryl west said...

Thanks Doug. it's fixed now and I modified the test and kml to test against multiple Placemarks.