When it Comes to Tweets, the Key is Location, Location, Location!

Raffi Krikorian works to make geotagging tweets fast and efficient

When you only have 140 characters to get your message across, you have to depend a lot on context. For Twitter, a big part of that context has become location. Knowing where someone is tweeting from can add a lot of value to the experience, and it’s Raffi Krikorian’s job to integrate location into Twitter. Raffi will be talking about this and other location-related topics at the upcoming Where 2.0 conference. We began by asking him how Twitter determines location, and whether it will always be an opt-in option.

Raffi Krikorian: I think part of it is based around the philosophy of Twitter itself. We only publish information that you’ve explicitly given to us on a tweet-by-tweet basis. So for location on your tweets, it’s all opt-in. You have to give us that location information, and we’ll put it out. There are other things we do behind the scenes, like our local trends information, that doesn’t actually tie to an individual person. We might do some IP look-ups. We look at your user location field. But for anything that’s tied to an individual, it’s all opt-in.

James Turner: 140 characters is a restriction that Twitter’s famous for. Location is fairly high bandwidth information. Have you considered carrying location data out of band from the 140 characters?

Raffi Krikorian: We do that right now. Originally, when people used to tweet location, they put a URL in their text field which linked to a map or linked to a service which might show where they are. But ever since we launched our geotagging API in November, we store the latitude and longitude for your tweet out of band. It’s completely metadata on top of the tweet. A bunch of clients implement it, such as Tweety and Seesmic Web, they can read that metadata, and will show you either a map or attempt a reverse geocode and give you an actual name.

James Turner: What value do you see location bringing to social networking? Usually, if someone is talking about a location, it’s explicit in the message or in the blog, “I’m at so-and-so and the show is really nice tonight.” If you imagine that people are pervasively providing their geolocation, how does that aid social networking?

where20-2010-block.jpgRaffi Krikorian: I think that one, it helps people like us at Twitter to be able to give more relevant context information to other people. Especially in our 140 character constrained lifestyle, you can’t necessarily put fully structured information of where you might be or what you might be talking about. But since we’re now trying to expand the dimensionality of our platform to include place, we can now store that structured data, and, therefore, we can analyze it better. We can deliver it to the right people better, and we can do more interesting high-level analytics. Therefore, we can deliver relevant search or relevant information to people who are wanting it.

I think one of the dreams would be, not necessarily for Twitter but for someone out there, to be able to look at status update streams with geotagging on top of it and try to figure out what are the hot bars out there tonight, or be able to see cross-referencing with my foursquare check-in, for example. I want to be able to ask the service, “What bar should I go to right now that my friends have liked that I think I’ll probably like and have no line?” And you’d only do something like that kind of high-level query if you actually have some really good way, either to analyze data or to get structured data out of the system. Analysis of that is going to be hard, especially in a world where you only have 140 characters to express yourself, for providing these metadata or meta ways to included structured information, and it becomes a UI problem to get that information into the system. It should become a lot easier for other people to build applications on top of it. So I think that’s where geo-type stuff would go for networking, with better recommendations or better information delivery, better stuff within social networks.

James Turner: It sounds, in some ways, like you could mine the data the same way that, for example, GPS data in people who have phones in cars or cell phone data can be used to infer traffic patterns.

Raffi Krikorian: That’s actually an excellent way to think about it. In the same way that you can watch how people are moving on freeways and try to figure out what’s going on, that’s a very passive interaction because people are looking at something and data’s being sent out of band whereas something like Twitter, you’re trying to express emotion or you’re trying to express sentiment through Twitter. And that sentiment with latitude and longitude attached to it inherently talks about not just a feeling, but a feeling associated with a place. If we could start tying those places not just to latitude and longitude, but to the contextual, then you could start really understanding what’s going on in the world and be able to deliver it to a lot of people.

It’s the foursquare world, right? It’s really important for me to find out that my friends really like that bar down the street. It’s also really important to me to know that my friends are down the street right now and if I’m not busy, I can go there. So I think that’s the direction we’re trying to take things.

James Turner: One of the things we saw, especially in Iran this year, was Twitter as breaking news source simply because it is something that along with taking photos with your cell phone, it’s just a very fast way of saying, “This thing is happening right now.” When you add geolocation in on top of that, do you see the ability to kind of infer news just based on what people are tweeting and where they’re tweeting it?

Raffi Krikorian: I think the question of what defines news is always going to be up for debate, but most certainly. We saw this a bunch of times. Right now, there’s a Twitter account that watches a USGS website and tweets with geotagging exactly where an earthquake happens within a few seconds of it hitting the USGS website. We’ve seen examples of people uploading photos via Flickr of traffic accidents on the web. And Flickr has implemented our geotagging API, so if you upload a photo to them with geotagging, they’ll pass it through to Twitter. And then on Twitter’s side, we allow you to ask either for tweets within a certain location or connect to our geohost and get a stream of tweets subscribed to location. So I could see all of the events that are occurring that are geotagged within the San Francisco area, for example, in real-time or in the Bay area. And then in the Bay area, I’m going to start to see the breaking news events, like I mentioned before, traffic accidents. I’ll see people just talking about random stuff. And I’ll see the earthquakes pop up here and there as the USGS stuff comes out.

James Turner: Does that alleviate some of the need for some of the function-based or event-based hashtagging, where if I’m in the San Jose Convention Center, then I’m probably attending the event that’s there, so I really don’t need to tag it?

Raffi Krikorian: I think yes and no. I think that the hashtag system allows for really good context, so people can at a later date understand what went on at the time without necessarily having to figure out how to cross-reference all of the geo-tweets and then cluster them with other geo-tweets in the area trying to infer high-level stuff. I think tagging, explicit tagging, still provides a nice human-readable way to understand that information. And then the geotagging provides a really good machine-readable way to dissect and also provide that out of band type information. I think it’s two different use cases. I ,for one, still apply hashtags whenever I talk about stuff to imply other things or imply with a different type of context than just the location alone might provide.

James Turner: We’ve seen a couple of interesting uses of social media that are apart from their obvious use. Someone was able to prove they weren’t able to commit a crime because they were updating their Facebook page when the crime was committed. Similarly, we just saw a news item on a criminal who was caught because he was updating his Facebook page and the cops were able to figure out where he was. Certainly when you’ve got something like tweets that are geolocated, you can see that type of thing. Do you think that this is going to become more of a privacy issue? Do you think this kind of openness right now is a fad? Or are we seeing a real paradigm change about what people feel comfortable letting other people know about them?

Raffi Krikorian: Well, I think you have two points there, and I just want to hit the first one first, which is being able to find out where people are; being able to know that someone updated a Facebook item from a certain location. What we’re doing at Twitter is not necessarily authenticated location, a lot of it’s still implied. There’s a lot of trust on either the application that updated Twitter or the person that’s updating Twitter. We provide no guarantees that the location that’s being reported by us is factual, except by the fact that someone posted that into a system. So just like you could totally lie about what you’re doing at Twitter — like I could tweet right now and say I’m sleeping, but I’m on the phone with you. I could send a tweet right now that says I’m in New York when in reality, I’m in my home right now in Oakland, California. A statement like that, of just revealing where my home is, sort of touches upon your second point. I think privacy in this type of world is a very tricky thing that needs to be maneuvered very carefully. Something like foursquare has privacy down because you need to have a bidirectional relationship with other people in foursquare to get a notification of where I am. I need to request to be friends with you, and you need to approve that request in order for me to get notification of where you are. I think that has privacy, or at least it has a privacy model which implies certain levels of control. For Twitter, since we have our asymmetric following method, I can follow you and you don’t necessarily have to approve me. The onus is on us to make sure that the user’s privacy is under control. It’s definitely something that services like us need to take into account, whether it means that we’re fuzzing the data, whether it means we’re going to be storing data with a different level of precision then you’ve giving it to us, are all questions up for debate.

If we don’t do that, then location-based services won’t take off at all, actually. A lot of people will be really concerned about their privacy, no matter how much of a fad there is, or how much uptake there is in the alpha nerd population. But I think if you can provide good methods of privacy control, that can be explained to everyone, so everyone understands what they’re doing in a very user-friendly way, then I think there’ll be a huge upsurge, because of the value of the data that can come back to people.

James Turner: We’re seeing news that the FBI wants ISPs to retain two years of email and two years of surfing records. I don’t know how much you could talk about it because of the wonderful government restrictions on this stuff, but is that the kind of thing you guys worry about at all, that you’re going to become a source for that kind of intelligence?

Raffi Krikorian: I’m not sure how much I can talk about it. What I will say is that we default to only displaying whatever data you gave us. So we don’t hide any data that you give us, unless you’re a protected user. Whatever data you give us, we publish back out again. So I guess the answer is yes and no on that point.

James Turner: What do you see as the technical side of geolocation, in terms of what’s going to be the new interesting technologies coming along, and how they’re going to be used?

Raffi Krikorian: From Twitter’s standpoint, it’s how do you accept all of this real-time data, index and analyze it and spread it throughout our system in almost real-time. People have traditionally built a bunch of GIS-like systems on top of PostgreSQL or on top of MySQL, and that’s fine, but it doesn’t scale after a while. After you throw a couple million or a couple hundred million entries at it, the amount it takes for one of those databases to process that, to insert it, all I have to do is select against it, and you can understand it’s untenable for real-time operation. And by real-time, I mean sub-second operation.

So the stuff that we’re doing is more geared towards how can you accept tweets that are coming in at what you can imagine to be an incredibly fast rate. Tweets are coming in, figure out their location, attach appropriate metadata data to it. Store it in our database. Span it out to anyone who wants to look at it. Run research and analytics on it and index it in their search index, and do this all within a couple of seconds on the way through the system. I think there’s a lot of interesting stuff being done out there on how things are being stored, how things are being indexed. But I think our personal contribution will be how do you do it at that kind of speed?

tags: , , ,