Wednesday, January 23, 2013

How to parse a KML file and find the centroid of all the Placemarks in Python.

A local forum has a tag game going on. In a tag game, you go and take a picture of an interesting place, then post clues leading to that place. The next person finds the place, takes a picture of him or herself there, then chooses another place and posts pictures and clues of it. It's altogether pretty fun.

A member of the forum is keeping up with the game on a Google map, and I became curious about the centroid of all the tags.

KML file

I downloaded the KML file and opened it up in Notepad++. It turns out the Placemarks are really easy to read:


XML Parsing

I fired up Python. Here's a quick and easy import for an XML parsing package.
from xml.dom.minidom import parseString

Basic steps

 Since this kml file is pretty simple, the steps are easy:

  • Read KML file as a string
  • Parse that string into a DOM
  • Iterate through a collection of coordinates elements from the DOM
  • Read the data out of the coordinates elements
  • Break them up into latitude and longitude
  • Find the centroid
So, let's get to it.

Read KML file as a string

    #Read KML file as a string
    file = open(location)
    data = file.read()
    file.close()

Parse that string into a DOM

    #Parse that string into a DOM
    dom = parseString(data)

Iterate through a collection of coordinates elements from the DOM

    for d in dom.getElementsByTagName('coordinates'):

Read data out of the coordinates elements, break them up into latitude and longitude

        coords = d.firstChild.data.split(',') 
        longitudes.append(float(coords[0]))
        latitudes.append(float(coords[1]))

Find the centroid 

    centerLatitude = sum(latitudes)/len(latitudes)
    centerLongitude = sum(longitudes)/len(longitudes)
    return ([centerLongitude,centerLatitude])
I will note that this is technically wrong. The Earth is a sphere, and this really only works for a Cartesian plane.  However, over a relatively small area, the error isn't great enough to make a difference.

Finished product

Here's the whole thing.

It turns out that the centroid is [-84.3412607740385, 33.79998225], which is unnervingly close to my house.