Trying to Track Swine Flu Across Cities in Realtime

John Geraci is a guest blogger and heads up the DIY City movement. He will be speaking about DIY City at Where 2.0 in San Jose on 5/20.

Since early last friday, when I got a tip about swine flu in Mexico City from a health researcher, the team that does SickCity has been working to make the system something that can (or could) detect swine flu outbreaks in cities around the world.

It hasn’t been easy.

SickCity is the “realtime disease detection for your city”, created by people at DIYcity. The service, launched last month, works by monitoring Twitter for local mentions of various terms that mean “I’m getting sick” and plotting those to location. Up until Friday, SickCity seemed to work reasonably well for the very rough beta tool that it is. It showed incidences of people reporting they had flu, or chicken pox, or other illnesses, broken down by city. You could look at a graph of the past 30 days for your city and see days when mentions of certain diseases and symptoms were higher or when they were lower. You could sometimes see trends. No one claimed that SickCity was ready for prime time, but those working on it felt that there was a very worthwhile idea in it that with a bit of refinement would be of huge value to communities.

On Friday, all of that got turned upside down.

Going to SickCity’s Mexico City page early in the day, I saw a sudden, several-hundred percent increase in mentions of flu. The problem was, not a single one of them was about actually having the flu – all were about the gigantic swine flu media event that was just beginning. Our disease detection tool had turned into a media event detection tool overnight.

Since then, we’ve been in a constant struggle to filter out the media effect from the data. The problem is, as the story grows and changes, the terms we have to filter for keep growing and changing. On Saturday we made a series of changes to the filters and search terms, and thought we were fine. By Sunday, those had become totally insufficient in the face of the growing Twitter storm surrounding swine flu (70 more results in the time it took me to write that sentence). We made more changes Sunday. Today, those additional filters seemed puny and insufficient. People are now calling swine flu “piggy flu”, “pork flu”, “bacon flu”, “wine flu”. They’re talking about Obama having flu. They’re talking about bird flu. The list of tweeting topics grows at an exponential rate. The topic of swine flu is incredibly viral.

So how do you get down below this huge cloud of noise, to the relatively tiny (but very important) signal down beneath? There are probably several thousand tweets happening right now about the idea of flu for every one that is about actually having the flu. The number of people actually coming down with flu right now in fact seems very low (let’s hope it stays that way).

Tracking other terms related to flu seems more promising – the term “fever” seems like a good one to look for, and once you get rid of the tweets mentioning spring fever, cabin fever and Doctor Johnny Fever, you’ve got a pretty good data set to use. But how representative of the flu population is that term?

Maybe tracking actual flu tweets in this situation isn’t really possible?

Still, the payoff in terms of value to communities and health organizations is huge if the developers can get something that can be demonstrated to work. As a public health researcher following SickCity told me, realtime outbreak detection is currently terrible at best. To improve on what’s there, you just have to give people a reliable signal that *something* is happening in a city. You don’t need to have exact numbers. You don’t even need to know whether what’s happening is actually flu, or food poisoning, or plague, really – the health officials can figure that out for themselves pretty quickly with all of the other tools at their disposal, once they know to be on the lookout. You just need to be able to reliably say “there is a sickness event happening right now in this city”, and that’s enough. You just need a canary in the coal mine.

So the developers behind SickCity, volunteers from DIYcity (mainly Paul Watson and Dan Greenblatt at this time, plus a few others) keep working on making it that. And right now they’re working round the clock. (It’s a public project – if you want to pitch in, by all means do so – you can get more info here.).

Even if SickCity fails to detect swine flu in cities around the world, it will have become a much more robust tool in the process of failing. If it doesn’t succeed in catching this pandemic, maybe it will be better prepared to catch the next one?

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.