Where 2.0 Preview – Tyler Bell on Yahoo's Open Location Project

You may also download this file. Running time: 00:28:07

Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.

Location can be a vague concept to pin down. To a surveyor, location means latitude and longitude accurate to a few millimeters, while to a cab driver, a street address would be much more useful. If you’re German, I can tell you that I live in the United States. To a Californian, I live in New Hampshire. And to someone from Manchester, I live in Derry. Unfortunately, the way that location is currently stored and presented online is both non-uniform and frequently at a level of precision inappropriate for the end-user. That’s part of what Open Location is trying to fix. Tyler Bell, who took his doctorate from Oxford to Yahoo, is currently the product lead for the Yahoo Geo Technology Group. At O’Reilly’s Where 2.0 Conference, he’ll be discussing Open Location.

James Turner: So first off, can you describe what the Geo Technologies Group does?

Tyler Bell: The Geo Technologies Group at Yahoo oversees all technologies that relate to geography and geographic information. So it’s largely self-evident. But this is what I mean by that: it’s really we own and oversee the maps and mapping technologies. So the visualizations and placements of geographically informed data. We also own user location technologies. So here, we’re dealing with different methods of detecting user location, managing user location, and ensuring that users receive geo-relevant results whenever they log onto Yahoo or use a Yahoo service. And then lastly, we have something which is slightly more esoteric. It’s called the Geoinformatics Group. And that’s the organization which uses geography to inform data. And we do this without ever showing a map. So it’s really how we add value and power to information wholly based upon where things are and where our users are.

FireEagle.pngJT: That’s like returning relevant search information to what you know about the user’s location.

TB: That’s correct. That’s the end product of search groups consuming the geo technologies services on the back-end. But what we also need to do is actually organize the geographic information. So instead of searches, they’re the specialists at Yahoo about matching user intent to the results that are returned; it’s our job on the Geoinformatics Group, for example, to say that when a user queries against Springfield or they’re searching for Springfield, which of the countless Springfields in the United States, in the world do you mean? So we need to be able to recognize that this is a place. We need to identify all of the places of a particular place name. And then we need to be able to do a so-called geo-geo disambiguation to ensure that when you mean Springfield, when you mean Campbell, when you give us a city name, which is otherwise nonspecific, we are very likely to return the most direct and accurate results.

JT: How do you garner that kind of relevant location technology? I know a lot is done now with IP addresses and some things are starting to happen with wifi and cell phone towers. How does that all kind of come together?

TB: Well, so that’s on the other side. I mean if you think of our group as a three-legged stool, the technologies that you just referenced focus on the user location side. And so obviously, there’s geo-based IT; wifi is becoming increasingly popular for very good reasons. And that’s how we basically can detect, or we prefer to have a user tell us, where they are. The other bit is really how we baseline the information. So what I mean by that is that we have a massive index that’s called WhereHaus. And it’s a resource about coming up on seven or eight million places, named places around the world and the different variants, how they’re called by different people. So, for example, within your introduction, you refer to the various interpretations of location. And we acknowledge that location, certainly on the Internet, but in the world more generally, it’s very personal. It’s also very subjective. So we developed technologies on the application layer that live on top of the data to help us try to anticipate what you mean by how you call a place. And that ensures that we deliver the most accurate results that you’re looking for, but it also ensures that we can geotag most accurately the places that you might reference when adding or uploading content into Yahoo as well.

JT: One of the tools that the group offers is the GeoPlanet API. Can you talk about the API and how people are using it?

TB: Yeah. The GeoPlanet API, that’s basically the interface onto this WhereHaus data. And it’s a very interesting way that we — I’ll step back one step by first saying that usually, certainly on the internet but more generally, geography is handled as a purely spatial problem. What I mean by that is that things are handled in longitude and latitudes. And traditionally, if you have a place such as a city or town which is polygonal on the map, it’s usually boiled down to a centroid, which again is a coordinate pair. And then all of the questions relate to the coordinate pair.

So everything about location on the internet boils down to two numbers which don’t really mean anything to an end-user but would mean a lot to a geographer or to a computer that can do a point radius search. Yahoo deals with location very, very differently. So instead of taking a spatially-based approach to location, we take a place-based approach. And so what we do is we have this idea of named places. So this could be a monument. It could be a park. It could be a region like the Pacific Northwest. It could be a continent and even the earth is a named place. So what we do is we take all of these different names places and all of these different granularities and we give them unique identifiers called Where On Earth ids or WOE ids for short. And then we relate these places together. And in doing so, we’ve created a geographic ontology of named places around the world. But this is more than just an index. It’s more than a guess [that] it’s here. We actually know how places relate to one and other.

And this is what the GeoPlanet API allows us to explore. So, for example, we can use the GeoPlanet API to determine what cities are in what county, what postcodes are in what city. We can find out what states or provinces or districts are in a specific country, what countries are on what continent. So there’s that sort of vertical hierarchy, which is very, very powerful. But then we also have this idea of a horizontal hierarchy as well. So if I’m searching or if I’m browsing against a specific postcode, our system can tell you all of the surrounding postcodes. If you want to know something about a county, we can also tell you what counties surround that county. So the power of GeoPlanet is not just in this very rich ontology of named places, which allows us to, of course, look up places and where they are. But the real power is in the relationships between places which allows applications that consume GeoPlanet to have a really solid sense of geographic intelligence. They can browse the horizontal and the vertical hierarchies with ease to really discover geographic detail that no other point radius-based search would allow us to do.

JT: I did some dealer locator software way back when, when the only thing that was really available publically was the census report Tiger database. And I know that a lot of that data, especially things like zip code to latitude and longitude data, is fairly closely held by the whatever the government’s postal organization is, because it’s a revenue source for them to sell that. Have you found any issues regarding that or do you feel like you’re kind of freeing that data up?

TB: Well, we free the data up wherever we can. And the information that we expose through our API, we have permission to expose through our API. But most of the data, we try to source from open and available sources. So one of the most important things is not just having this data but actually allowing us access to it and allowing us to share it out. So coming back to the point about really open location, one of the goals that we want here is that we want to be able to ensure that we can all refer to the same — when I say we, I mean developers and users around the world — can refer unambiguously to the same place no matter how it’s called. So the United States is the United States or it’s USA. Or it’s Les Etats Unis.

All of the different labels are assigned with the same Where On Earth identifier. And it’s really exposing that identifier out is we think the prime benefit. We give you as much information about that place as we can. But we don’t aim to be the exhaustive resource for this information. So we won’t give you polygonal representations of these places. We won’t tell you everything about them or their census statistics or the population. It’s really, “Here’s the identifier. This is where it can be found. And this is how this identifier relates to other identifiers.”

JT: Before we move on to Open Location, I have a couple more questions about location at Yahoo. Flickr is probably the most visibly geo-enabled Yahoo product. What interesting things are you and other people doing with all of those georeferenced photos? And what do you see as kind of the next step in that space?

TB: Yeah, I’ve got to give you the sort of usual caveats of I can only talk about that to a certain extent. Flickr does a fantastic job of allowing users to georeference their images geospatially. I’ll tell you what Flickr tells us and what it tells the world is that with the 100 plus million geotagged images, it really tells us and everybody that location is hugely important to our users. They’re not getting paid to do this. But 100 million plus photos are being explicitly georeferenced around the globe. So location is important. And people are keen to share and talk about location and reference data. So I’ll come back to answer your question a bit more specifically. What Flickr gives us and what, more specifically, the users of Flickr gives us are a fantastic resource of visually orienting ourselves any place on the planet. So these images annotate place and they talk about place. And they talk about location. But also, the actual tags that are associated with these images can, in aggregate, tell us a great deal about that location as well. So Flickr’s a wonderful medium for tying together concepts with visual representations and place and location specifically.

You also mentioned that Flickr’s very visible. It is indeed a lot more of the geo-relevant and a lot more of the geoinformatic that we do under the covers here at Yahoo are affected out much more subtly. So we tend to focus, certainly in the Geo Technologies Group, less about getting things on the map and much more about ensuring that a mobile user, a desktop user, end-users have at any part, wherever they are on the Yahoo network that they have the most geo-relevant experience possible.

JT: What are some of the other cool things that are happening inside of Yahoo location-wise, if you can talk about any?

TB: Well, we’ve got Fire Eagle, of course, which is coming up on a year plus old now. Fire Eagle is the API that allows users to share their locations with any other applications or consuming devices. So one of the big issues about location, not at Yahoo but fundamentally within the sector, is that it’s highly siloed. I mean carriers are a case in point. Anyone who has location knows its value, and they tend to keep it to themselves. Users don’t want this behavior. Application developers generally do, which is why it’s difficult to get your location off of your GPS device or off your phone or off your web application.

What users don’t want to do is have to update their location manually or even automatically depending on which device or which application is used. So Fire Eagle really allows you to get your location off of one device onto another or off of one platform and distribute it across multiple platforms. So you can think of it as a user location sharing switchboard. And the value that’s added on top that we do at Yahoo is we allow you to do a granularity control layer, sort of a geographic ACL, you can think of it that way. Which means that you as a user, and this is one of the fundamental things about Fire Eagle is that we allow the user to control, in the finest detail, how their location is consumed. What you as a user can do is say, “Well, I want Application A to know my location to the city level. Application B can know it to the postcode level. And Application C can know it down to the street or the curbside level.” And that’s part of what the underlying Wherehaus in the geoinformatics engine allows us to do is apply those degrees of granularity on top of the security layers for Fire Eagle. So that’s one of the more exciting things that we have on offer right now.

JT: If I’m a random developer, application developer and I want to leverage this technology, is it out there to use? Or do I have to license from you? Or what’s the arrangement?

TB: Yeah, it’s out there to use. You don’t need a BD deal or a license deal. You can go to developer.yahoo.com, and you can find APIs for Fire Eagle. You can find APIs for GeoPlanet and on maps and all of the different geo technologies products there. And it’s really just available for you to get an application, apply for an application ID and then start using the services immediately.

JT: Can you describe in nonmarket speak exactly what Open Location tries to do and what it means for end-users and developers?

TB: So this is something that we have internally here at Yahoo. It’s not an endorsement of any kind of initiative of the same name. Really what Yahoo is aiming to do is make location accessible and ubiquitous. Yahoo and our developers want to see a location-enabled internet. And Open Location is the name of our initiative here at Yahoo to expose new places and name places around the world. Make those accessible off the Yahoo network. Accept new places from our users. Obviously, a small development team cannot capture the world’s geography as it is called by the world’s people in its entirety. So we are looking to our users to contribute to this. But also, it’s fundamentally focused on the other side of that same coin which is user location. How can we provide technologies to enable users to share their location on and off network and other technologies that will help Yahoo and other organizations and developers detect and manage user location when the user gives them permission to do so?

JT: There is kind of a question here which is it starts to sound like we’re replacing a bunch of little silos with a few big silos because you’re saying we can offer you location, services and information. Obviously, Google is trying to get into that space. I’m sure Microsoft is getting into that space. So, again, it kind of feels like you’re going to pick the guy you want to hitch your fate to. And then, again, it’s still — even if it’s an open API, it’s still proprietary underneath.

TB: Yeah. I mean your point’s very valid. What I’d say in response is that locations and location beacon information, anything that deals with either user content location that comes into Yahoo can go right back out again. So Fire Eagle doesn’t accept user locations from third party applications and then keeps them on network. We pass it back out. So our goal is to become a conduit for location, both about the user, about places, about content. And then make sure that that is accessible right back out again in an open format. So it’s not a one-way channel.

What we want to do is we want to add value on location entities, if you could think of that as a very generic term. We want to add value as they move through the internet. And we see the only way that this can be done is by ensuring an open nonproprietary solution. So you mentioned Google. I mean if you look at their maps products, entities that are creating their maps product are not accessible off Google. If you look at Latitude, the user location that’s created within Latitude is not accessible off Latitude or indeed, off the Google network elsewhere. So we’re taking a fundamentally different approach because think it’s not only the only way that LBS will actually attain the potential that people have talked about for 15 years, but fundamentally, we also think it’s the only way that you can move the industry forward. We really need to get the basic information out there and make it accessible and watch Yahoo and others innovate on top of this openly accessible information rather than trying to control how it’s distributed, how it’s captured and how it’s shared.

JT: Is there any particularly strong commitment being put out that it’s going to remain open and freely accessible?

TB: Well, the idea here is that we think it’s in our best interest to ensure that it’s open and accessible. So what I mean by that is that you — I mean say you contribute some information to Yahoo and then we push it back out again. The nice thing about location is that we aren’t necessarily going to be popping out sort of polygonal entities, right? They can be sort of copyrighted. These are place names. These are integers. And we’re sharing this information out. And we’re exposing the API.

So we boiled it down to really just essential elements that it’s going to be difficult for us, even if we wanted to bring back say Where On Earth ids and we wanted to bring that back into our environment, calling back a number of 32-bit integers and how they relate to each other is going to be extremely problematic to do so. So we don’t want to get involved in sort of cross licensing deals. We don’t want to get involved in massive geo patents that patent the obvious. It’s in our best interest, and we would also think the user’s best interest, to get the information out; make it ubiquitous. Make it accessible. Make it difficult to even get back into proprietary hands if we wanted to do that. And really see what happens to the market from there.

JT: What are the biggest kind of day-to-day headaches that you have just keeping this all organized and trying to provide the data to users and applications?

TB: Well, I guess the two big ones are things that just come with the sector. So the first would be geography is so incredibly subjective and non-consistent. One thing that GeoPlanet tries to do, and we do so very successfully, is to provide a uniform access layer across the globe. So if you develop a geographic solution that works in the US and it sort of looks at counties and cities and then goes out to surrounding cities that you can just take this same bit of code and apply it to Germany and to France and Britain, even though the geographies there are fundamentally different.

One of the difficult bits there is ensuring that that abstraction that we do across these highly subjective geographies is in fact consistent and behaves in a manner that the developers and the users expect. And I guess the other thing is it’s all about user location and the expectations of the user. I mean this is going to be undoubtedly a very hot topic of discussion at Where 2.0 because with Street View, with Latitude, it’s something that Fire Eagle’s tried to address, and I think it’s done so very successfully, which is put the control of the user’s location in the hands of the user and let them expose it as they feel comfortable to do so. And I think what you’re going to see is that people — just like sharing credit card information online, just like getting, for example, a scanned photograph of you online, people are going to become increasingly comfortable with the idea of sharing their location, not only because unlike a picture, your location is shared by many people around you. It’s not simply you. It doesn’t uniquely, permanently and temporally identify you. So location’s powers when it’s shared, we want to ensure that users have control of that. I think it will — gradually, people will become more and more relaxed about how they work with their location, how they share it and how they expose it. But the trick, of course, is letting them become comfortable with this in their own time. And that’s a bit of a dichotomy that we wrestle with. And really the answer is the way that Fire Eagle handles permissioning and security and privacy.

JT: That brings up an interesting question which is we’ve already seen with some georeference temporal information, namely toll records, that what you may not realize at the time was something incriminating you can come back to bite you later. To what extent is the potential for the government or other entities to come in and say for reasons of national security or whatever reasons, “We need access to all of this information about where people were and when”?

TB: Yeah. I mean answering those kinds of legal questions is always problematic, so I’ll keep my answer kind of brief which is Yahoo is not in a position to insert itself illegally between the government of a specific country and the user, obviously we care greatly about our user rights and our user privacy. But there are significant issues surrounding a company’s ability to illegally protect its users. So I tend not to address that aspect of it.

What I do tend to say is that we try to sort of get around — not so much get around the issue, but coming back to the idea of user control, we want to ensure that users are — unlike your toll bridge information — that users are aware that your location’s being recorded because they’ve authorized access. But with Fire Eagle, for example, they can always come on to the system, and they can clear their location. So even before the government gets involved, before you get all of these heavy hitters, before you get national security involved, there’s always the option there that users can clear their location from the system.

JT: I’m going to ask one more politically sensitive question which is, the issue of what you call certain places can get to be an extreme hot button. There was a whole scandal about, I think, Google Maps and what they called a section in the Middle East a month or so ago. Cypress is another example that comes to mind.

TB: Yeah. I’m not even going to tell you about the problems we had when we accidentally called Constantinople Byzantium, just slipping back about 800 years there accidentally. That’s a very sensitive issue. Any company dealing with geography is going to have to address it somehow. So I’ll be very candid in how Yahoo addresses this. I mean first, our stated goal is to capture the world’s geography as it is used by the world’s people. We don’t see ourselves as the definitive authority on how a place should be called. So to give you an example, I mean if you look at the way that the Oxford English Dictionary works. When they’re criticized for capturing a new buzz word or an insensitive term, the OED very rightly says, “It is our goal to capture the English language as it is used at this time”, and it’s not our [Yahoo!’s] goal, unlike, say, Webster’s Dictionary, to impose how things are called.”

And we take a very similar stance. And we take that stance not to sort of chicken out of the issue; we take that stance because we want to make information discoverable and accessible geographically. And we can only do that if we look at how places are called and we accommodate both common names. Now obviously, very often we’ll have to default to one or the other. We tend to follow the UN guidelines when it comes down to officially how something might be called in the first instance. But on the whole, we want to know all of the informal. We want to know the ethnic. We want to know the colloquial terms for all of the different areas around the world. And we are much less concerned about actually imposing formal geography as we are consuming how it’s conceived today.

JT: To finish up, you are going to be speaking at Where 2.0. Can you talk a little bit about what people can expect to hear there?

TB: Yeah. I’ll have to be slightly candid because — well, I’ll answer the question and then we can talk further. I’m going to talk about this in greater detail and perhaps even greater coherence the idea around how Yahoo handles and conceives of Open Location. We want to provide tools that allow organizations, developers and other companies to innovate on location as a readily accessible ingredient to any successful business.

We don’t want to make the idea of the management and the capture of location the business in itself. So that’s what we mean by opening up. And we deal primarily with the locations of users. But also the locations as they are called around the world. And then I’ll be announcing a platform that we have coming online which isn’t openly accessible or openly available right now, which I would hope makes the world’s information that much more geographically identifiable and, of course, relevant to the users who employ Yahoo’s platforms.

JT: Well, thank you very much for speaking to us today. I’ve been talking to Tyler Bell, who is the product lead for the Yahoo Geo Technologies Group. He will be speaking at the Where 2.0 Conference in the middle of May. Thank you for taking the time to talk to us.

TB: Thank you, James.

tags: , , ,
  • eugene

    Great interview, thank you both. James, I guess you meant Cyprus when you wrote Cypress though.

  • Anonymous

    and istanbul when you said constantinople (or was that a geographer’s joke?)