Much of the information we have about how cities work (or don’t) comes through direct, intentional observation and study–but could we learn as much or more by mining the data that citizens generate in their day-to-day lives, through cell phone traffic and internet usage? That’s one of the questions that Andrea Vaccari, a research associate at the MIT SENSEable City Lab, is trying to answer. Andrea will be speaking on the research that the SENSEable City Project is doing at the O’Reilly Where 2.0 Conference in May.
James Turner: So why don’t you start a little bit by talking about what the charter of the SENSEable City Lab is?
Andrea Vaccari: Sure. The SENSEable City Lab is a recent initiative; a new initiative of the Massachusetts Institute of Technology which focuses on studying how digital technologies are evolutionizing the way we live in cities. And, therefore, how we can leverage these technologies; how we can make use of it through understanding how cities are using it; how we can design better cities. And then we can create cities that are more sustainable, more livable and automatically more efficient.
JT: A lot of data that governments gather about cities — the example I think of is the little things they put across the roads to find out traffic going over a road, but that’s almost like just a point source data. Can you compare that to the kind of data that you’re able to extract through the records you can get access to?
AV: Sure. The problem with past data in all aspects of the urban planning and social studies is that the data is usually punctual, so it refers to very specific points in space and also in time. And that’s because the methods that were used to gather this information were very expensive. They required either to deploy infrastructures or to employ people to count manually cars, people, vehicles. And, therefore, it was impossible to have a real-time flow of information. What we are trying to do is to leverage the pervasive systems that enhance our cities today. And I’m referring to telecommunication networks, wireless networks, transportation systems or any other sort of digital system that interacts on a daily basis — on a real-time basis — with the citizens. What happens is that with these systems, interactions between the user and the system creates logs of their activity. And these logs can be used to understand the urban dynamics, to understand how people move in living cities and how cities themselves evolve in time.
JT: Now, you showed me some of the examples of the datasets that you’ve been playing with, and it seems like largely it’s cell phone data and wifi data and then secondarily, things that are more voluntary like Flickr uploads.
JT: Wifi data you can pretty much get to a hotspot. And as Google has demonstrated with cell phone data, you can get fairly good positioning. But what kind of resolution do you get out of say cell phone data?
AV: Sure. The resolutions that we get for the cell phone is aggregated at the antenna level. So we don’t get information about the individuals because we strongly respect privacy. And what we basically know is how many calls, how many text messages, how much traffic is served by each antenna in a city. And, of course, we know the position of the antenna and we can estimate the coverage of these antennas. So we can fairly understand what are the dynamics going on in the area of coverage. But, again, we don’t get information about individuals.
JT: Right. And in cities that works out fairly well because cell size tends to be fairly small in the cities. It would be a lot more problematic if you were in a countryside, for example.
AV: Exactly. Exactly. So at the city level, that’s also why Google works well in giving you a location based on your cell phone connections. So we can understand pretty well what’s going on in a block or in a district.
JT: Now, some of the data you showed me did show flows. And a flow would tend to imply that you did know that the same cell phone had gone from the same place to another. How do you get that data?
AV: Sure. It’s basically the same approach. We get aggregated information of the number of handovers that are served by each antenna. Handovers are a procedure that’s enacted when a person is actively on a call and since it’s moving, the call needs to be transferred from one antenna to another so that the call doesn’t drop. So to do that, there is a very complex procedure called handover. And these handovers are stored. So what we get is at the antenna level, how many handovers were served by that antenna to each other of the neighboring antenna. So, again, it’s aggregated information.
JT: Right. And you kind of have to infer out of that because once you get to the next antenna, you might be going in five different directions.
AV: Sure. So —
JT: You have to kind of make some guess work I would think.
AV: Exactly. So you cannot really track individuals, but you can still fairly understand what are the major flows of traffic.
JT: Why don’t you talk a little bit about some of the specific projects that you’ve done with this? You showed me several projects you did in Rome.
AV: Sure. So let me first say that we tried to approach the study in two different ways. One side, we employed demos and urban interventions to sort of give a hint of what we can expect in the future from our cities, what kind of buildings, what kind of interfaces, what kind of real-time information will be available for us to make better use of our cities?
JT: So that’s kind of like a world of tomorrow type of thing?
AV: Exactly. So it’s based on data visualizations or on prototypical interfaces that work but are not supposed to be deployed immediately on a large scale, but that still work. And, yeah, they are meant to inspire both the citizens themselves, but also possible stakeholders in these revolutions from telecomm operators to electronic outdoor manufacturers to cities themselves that can be part of these revolutions. And on the other side, what we do is actual data analysis — the nitty-gritty stuff that needs to be done to actually make this possible. We analyze this data and we infer useful information for both researchers and practitioners like the urban planners and the architects. So in the case of Rome, we created a first set of visualizations in 2006 for the Venice Biennal, which presented a map of the cell phone activity in the city of Rome during special events, for example, during the final game of the World Soccer Championship.
And from these maps, you can really see how the city is behaving; how the city is pulsating; where people are gathered and what sort of emotional status they are in. If they are excited and they are making a lot of calls or not. And in a second iteration for the Notte Bianca event in 2007, the Notte Bianca event is a night where all shops or public spaces are open and public events are organized by the city. And the people just go out until 8:00 in the morning. We created an interface that showed in real-time cell phone activity. So where the social dynamics were going on: bus and taxis position, news uploaded by the Republica journalists (one of the major Italian newspapers), and information on the events themselves.
JT: In a future world where this is more pervasive and available rather than being a one-shot, how would you see urban planners and governments using this data?
AV: Well, for the urban planners, there is a big, big revolution going on. What happens today is that policies and plans are thought by assumptions. And their effects and imports can be evaluated only after a long time that they are implemented because, again as it was seen before, gathering this information is expensive. It’s costly. It’s cumbersome. So it’s really impossible to get this information in real-time. What is going to happen is that instead of planning the city, the urban planners would actually have to program the city, to configure [it] in real-time because information will flow in real-time. So if you change the direction of the one-way road, you will see almost immediately what the effect on traffic is. If you close an area to cars, you can see immediately what will happen into the mobility in general. And if you create public spaces in a place rather than another, you will see immediately how people will react to that.
JT: Right. So one example might be, for example, a crude example, in Boston, they shift the lanes on highways in the morning and the evening to have more traffic in one way. But that’s done in a very gross sort of time-based thing. I mean you’re talking where you could responsively say, “Move a lane over because —
AV: Exactly. This can happen in real-time. You can have street signage reacting in real-time based on traffic conditions so that if some areas are traffic-jammed, you can redirect traffic in another lane. This will also be supported by new GPS devices that will be able to inform you this is happening already today but needs to be done at a more integrated level so that all people are aware of what other people are doing. And this can also be very helpful in the case of emergencies from car accidents to fires to the need for evacuation of specific areas to help first responders to arrive on the place in the shortest time possible.
JT: One of the interesting questions I’ve always had about the GPS rerouting thing is now that you’ve got more and more people have them in their cars, GPS tries to route them around traffic jams. Isn’t there almost like a Heisenbergian effect where just knowing that there’s a traffic jam and routing around it just moves the traffic?
AV: That’s exactly why I was saying that we need a more integrated and somewhat smart way of adding this information flow. Right now, it’s basically the individual that accesses this information, but the information is raw. The individual doesn’t want to know where the traffic jam is; it wants to know where to go in order to avoid the traffic jam.
AV: And that’s what we are trying to build. We are trying to build systems where information flows in real-time. And the citizens as well as businesses or public authorities would be able to access the information that they need when they need it.
JT: One of the questions I have is a lot of the work you do here is very concrete and has very tangible, quantifiable, viewable scientific results to it, and then you have this other side where you do almost like art installations and stuff that frankly is like the stuff that the Media Lab is well-known for. It’s like fashion show type stuff. I understand that’s an attempt to make this data accessible to the public, but does it in some ways also trivialize it?
AV: That’s a very good comment also because I recently saw some comments about our work that were asking what’s the function of these visualizations. And I have to say they are very useful. And they are extremely important in two different ways. On one side, yes, they are helpful to inform the citizens to educate, in a sense, the public to understand this kind of information; to make them understand that their actions build up on an overarching dynamic system which is the city that really is built of individual choices. But these individual choices emerge as one unique entity which is the city again. So as we somehow try to explain how financial markets work by showing some graphs or charts at the end of the news on TV or on newspapers, I think that we will have to do the same to inform the public about these issues and to let them understand what it means. On the other side, these visualizations are extremely helpful and I have to say successful in helping those who are stakeholders in this revolution, as I was saying before, which includes telecomm operators or municipalities in getting interested into this analysis, in understanding the potential. And really by seeing this data visualized, the decision-maker can grasp it. And these visualizations helped us collecting some of the data that we then used for our quantitative analysis.
JT: You seem to get a lot of cooperation out of people like cell phone carriers, [but] they’re pretty notorious for keeping their data very close to the vest. How are you able to get that kind of cooperation out of them?
AV: Well, I have to say most of the telecomm operators really want to open up to their customers in a very friendly way, not with the meaning or the intention to take advantage of this information to bill more their customers but really to understand their needs. The problem is that this industry has been known for taking advantage of this information in the past, also through governmental agreements. But really, this is not the case any more. A lot of telecomm operators are willing and very interested in this, but they have still to face these kinds of issues.
And, for example, our collaboration with AT&T is creating a lot of value for the public, a lot of successful work that has also been published in publications, in papers as well as exhibited, for example, at the MOMA. And really, with them, we are creating value for the public. What helps us is MIT’s authority and the fact that it’s an institution which makes it easier to understand the fact that we are really only doing scientific work, that we are not taking advantage of this data. And let me add also that the data that is provided by all of our partners is provided according to their privacy policies. And there is a 2002 directive by the European Parliament on privacy, and we respect that as well.
JT: Even though the data is annonymized, as Yahoo demonstrated at one point, even annonymized data can be mined for what turns out to be some fairly sensitive information. If we take this beyond the, as it were, innocent ways you’re using it now and say if I was a totalitarian government and I suddenly could see where people were aggregating, that could be a very useful piece of data if I didn’t authorize 200 people to be there. It tells me something. Where is this balance going to lie here?
AV: Well first note, the data is not really annonymized. It’s just that it’s aggregated.
AV: So it’s only at the level where the individual doesn’t even play a role in the dataset. As for the uses, well of course, all technologies can be used for good or for bad. And I really cannot answer these questions by saying we can implement the mechanisms that will allow you to only do good or otherwise not to do anything at all. The truth is that we see a lot of benefits, and they are much more easier to get to using our technology than the bad things that you can do. And you can do bad things with all technologies. So while we are aware of these and have been discussing these issues with also people from the Electronic Frontier Foundation, we believe that the benefits of this technology can’t be ignored. And we should pursue this anyway.
JT: Going back to more fun stuff, you’ve also done some interesting work with Flickr data. Could you talk a little bit about that work?
AV: Sure. Well, basically, our major datasets are on one side cell phone activity and on the other side, Flickr pictures. They are not the only ones of course. But they are the two best examples of what we call digital shadows and digital footprints. Digital shadows are all of those data that are gathered by the interaction and conscious interactions of the user with pervasive systems. Digital footprints, on the other side, are explicitly released information about the behavior of citizens in the city. Flickr pictures are publically available online.
And a good deal of those is available and geotagged. So you can download them and see where they were taken. Since you know also the user that took them and you know, for example, his or her nationality, you can really see how people flow within a specific area from the level of the nation to the level of a city, back down to the level of a specific area within the city. And you can see where most of the pictures are taken: what are the hotspots, what kind of temporal signature is in specific place so whether tourists go more in the morning or in the afternoon.
And more interestingly, you can see flows. You can see where people go to take a picture first and where they go next. And all of these places are interconnected with each other.
JT: I’ve been talking to some other folks who are looking at doing things like geotagging social networking. Taking tweets or taking other data sources like that that are starting to become geo-enabled. It strikes me that those would be another great data mining source. You could look for keywords and thing like that in tweets with geotags. And maybe if you see a lot of people using ‘party’ in their tweets, you know, “Hmm. There’s a party going over there probably.” Is that something you guys are looking at all?
AV: Sure. We are actually starting to get into Twitter. As it regards more in general social networks, be mindful that they are social, but they are virtual social networks. So the relationships in these social networks are really a lot weaker than those in real life. What we try to do is to analyze real dynamics, social dynamics that happen in the urban space, in the real life. I’m not saying that those have no importance or no meaning; I’m just saying that the information that you can get out of information like cell phone activity, like Flickr pictures, belong to the real world. And it’s actually a way to connect the digital to the physical. But still, for example, in the case of Twitter, which is kind of in between because it tells about real activities of people —
JT: Right. Most people are tweeting about what they’re doing now.
AV: Exactly. So in that case, yes, it’s another kind of — I would call it a digital footprint like in the case of Flickr pictures that we are getting into.
JT: Right. As opposed to, for instance, Facebook where it’s —
AV: Exactly. It’s a lot more difficult to use information from Facebook if it were available to actually do some analysis.
JT: You’re going to be speaking at the Where 2.0 Conference coming up in May. Can you give us a little bit of a feel for what you’re going to be talking about?
AV: Sure. I really want to focus on the latest projects. And there are two that I’m really interested in presenting. One focuses, again, on this analysis of cell phone activity, but I will describe more in details what we do. We show some of the graphs so the visual analytics that we can produce, the kind of information that we can provide to public authorities as well as enterprises or the citizens.
On the other side, we have started to work on a new project called MIT Enernet which is intended to create a new platform for building efficiency. And the objective is to use a new kind of data, basically human occupancy estimated through wifi connections to actually understand not if the building automation system is functioning correctly, but if it’s actually configured correctly. So if it’s actually responding to the real need of the people that inhabit the space. So these are the two projects that I would focus on. And I will really try to analyze the latest results that we are getting in these areas.
JT: Thank you. We’ve been speaking to Andrea Vaccari who is a research associate at the SENSEable City Lab at MIT. He’ll be speaking at the Where 2.0 Conference in May. Thank you so much for taking the time.
AV: Thank you very much, James. I look forward to seeing you there.