Why location data is a mess, and what can be done about it

SimpleGeo's Chris Hutchins on the state of location data and the future of location services.

Between identifying relevant and accurate data sources, harmonizing data from multiple sources, and finding new ways to store and manipulate that data, location technology can be messy, says SimpleGeo’s Chris Hutchins (@hutchins). But there are ways to clean it up. Hutchins explains how in the following interview.

What makes location data messy?

Chris HutchinsChris Hutchins: The primary reasons are:

  • The ever-complicated restrictions, licenses, and use rights that come with different datasets — this can include requirements to use a company’s map tiles, to share back all derivative works, and sponsored listings or advertisements alongside the data.
  • Conflating records that represent the same location/business/place between multiple datasets is an incredibly arduous process.
  • With small datasets, spatial queries are quite simple. However, as datasets grow exponentially in size, indexing that data to enable fast queries becomes difficult.
  • Location is usually an opinion, not a fact. For example, there are very strong views about where neighborhoods start and end.
  • The nature of location-based information requires all technology to handle real-time requests against datasets that are always changing.

What can be done to clean up location data?

Chris Hutchins: Part of cleaning up is understanding the situation. By being aware of the limitations of certain databases or of the restrictions that some datasets require, you can better understand your capabilities.

Specifically related to data, ensuring that your data source is providing clean and up-to-date data means you won’t be sending end users to the wrong location or giving them false information. Also, as more companies understand what their core competency is — and what it isn’t — they learn to trust other companies to handle the things that require a more niche expertise. Understanding that this technology is new and learning to embrace tools and services in their infancy will certainly give you an edge with location data.

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD

What are the most challenging aspects of location-aware development?

Chris Hutchins: The primary challenges we hear about are a lack of fast and accurate tools for storing, manipulating and querying spatial data, and the fact that most data is expensive and comes with restrictive terms of use. Today’s geospatial infrastructure platforms are antiquated, so building the back-end infrastructure for applications takes a long time and requires some very niche skills.

How is SimpleGeo Places being used?

Chris Hutchins: SimpleGeo Places is a free database of business listings and points of interest (POI), which is being used by applications to get an up-to-date view of local businesses without having to manage a large and changing spatial database in-house. Most current POI databases have restrictive terms of use and are expensive. We believe that this has impeded innovation in the development of location-aware services and applications, so SimpleGeo provides an amount of usage of our Places data at no cost to developers and it will always be free of restrictive licensing.

What future developments do you see for location technology?

Chris Hutchins: The future of location is context, where apps will be better at giving you relevant information based on real-time information about where you are and what’s around you. I’m really looking forward to a world where by knowing where I’ve been in the past, the things my friends like, the weather, and more, applications will be able to pinpoint where I might be interested in going and what I might be interested in doing, as well as getting me there.

This interview was edited and condensed.


tags: , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • I totally agree with Chris that the the future of location is about context. The future is TODAY. We have been working on a “secrete project” called The Location Genome Platform. It is currently in private beta. http://blog.locationgenome.com/
    This is a platform that is centered around location but focused on contextual understanding of locations. It provides deep understanding of locations from the vast amount of data publicly available on the Internet. It offers very powerful query capabilities across geospatial data, contextual data and structured data.
    We plan to officially launch this platform in the next few weeks. Drop me a line if you want to join our private beta.

  • Simon McBride

    So what you’re saying is that location data is all over the place?

  • Nice one, @Simon. That should have been my headline.

  • Mac

    @Simon: Well played. Wish we’d used that!

  • The alpha and omega of a neighborhood may be a matter of opinion, but the location of a road should be a matter of fact.

    I recently got a FedEx package at my house. The map that FedEx attached to the package shows the start of my country road off by a good mile or so. Clearly, the FedEx guys have figured out the right location. Do they store such “corrections” in a database, or do the drivers just learn them through trial and error?

    About a year ago, I notified the Bing Maps data vendor of the error. This time, I noticed that Yahoo! Maps was wrong. When I went to notify navteq.com of the error, I discovered that their map was already corrected. Apparently, there’s a huge time lag between updates in the vendor database and the clients’ updating their databases.

    I also abstractly wonder why Google and Microsoft and Yahoo! can’t compare their maps. Or why the map vendors don’t contract with FedEx or Dominos or others who could provide verifications and corrections to their databases. Having data that customers can confidently rely on should be their highest priority — especially when they are driving on country roads.