Amazon Hosts TIGER Mapping Data

Last week at Ignite Where Eric Gundersen of Development Seed made a significant announcement for geohackers looking for easy access to open geodata. Amazon will be hosting a copy of TIGER data on EC2 as an EBS (Elastic Block Storage). Eric stated that this happened during the Apps For America contest in 2008 when they need open geo data for their entry Stumble Safely (which maps crime against bars).

amazon hosts tiger data

Amazon is now hosting allUnited States TIGER Census data in its cloud. We just finished moving 140 gigs of shapefiles of U.S. states, counties, districts, parcels, military areas, and more over to Amazon. This means that you can now load all of this data directly onto one of Amazon’s virtual machines, use the power of the cloud to work with these large data sets, generate output that you can then save on Amazon’s storage, and even use Amazon’s cloud to distribute what you make.

Let me explain how this works. The TIGER data is available as an EBS storeEBS, or Elastic Block Storage, which is essentially a virtual hard drive. Unlike S3, there isn’t a separate API for EBS stores and there are no special limitations. Instead an EBS store appears just like an external hard drive when it’s mounted to an EC2 instance, which is a virtual machine at Amazon. You can hook up this public virtual disk to your virtual machine and work with the data as if it’s local to your virtual machine – it’s that fast.

The TIGER Data is one of the first Public Data Sets to be moved off of S3 and switched to an EBS. By running as an EBS users can mount the EC2 instance as a drive and easily run their processes (like rendering tiles with Mapnik) with the data remotely. If you’re a geo-hacker this makes a rich set of Geo data readily available to you without consuming your own storage resources or dealing with the normally slow download process.

I love the idea of Amazon’s Public Data Sets. It’s an obvious win-win scenario. The public is able to get access to rich data stores at a relatively cheap price and Amazon is able to lure said public onto their service. Smart.

  • wytten

    This is wonderful…I could have used this back in ’91 :)

  • In a nutshell, how do I consume this data? I have no experience of working with the cloud, and wish to find localities by zip code in an ASP.NET application…

  • Great example of a “cloud” service that’s more than merely “remote hosting.” This fully embodies the “Whatever-2.0” ideal (of being more valuable as more people get involved), as well as extending a central-hosting advantage (of easy exchange of vast data) to these cooperating members.

  • I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps to manage S3 on Windows . It is a freeware.

  • Anthony,
    Since that data is stored in an EBS, that will mount to any of Amazon’s virtual machines (Linux or Windows EC2s) you can boot up a Windows EC2 for about 1.20 an hour, install your .net tools you need, mount the TIGER EBS drive to your Windows instance, and start manipulating the data, even saving your results out to s3. If you end up building a nice custom Windows environment with your .net tools, you can save the configuration as an “AMI” so other people can boot up a terminal with your same configurations (they pay the hourly cost and the AMI is stored in a public directory).

    Also, since it seems like you’re dealing with proximity to zipcodes, you might want to use one of the freely available zip code databases, which store centerpoints to zip codes in SQL format. The TIGER data would be able to give you a much more accurate answer (because you could find proximity to the edge of a zip code, not just the center), but starting out with centerpoints might give you a quicker answer and involve less work from start to finish. There are many, many options in the geo/location field and it’s good to be aware of all of them

  • Great, but why not do it with OpenStreetMap data? Having Amazon host a PostgreSQL instance containing the OSM data set would make building all sorts of cool OSM applications so much easier. At the moment, if you want to do something public, you need a copy of all the data and then the tiles you render as well.

  • census data as a mountable disk in EC2