3 big ideas for big data in the public sector

Predictive analytics, code sharing and distributed intelligence could improve criminal justice, cities and response to pandemics.

If you’re going to try to apply the lessons of “Moneyball” to New York City,’ you’ll need to get good data, earn the support of political leaders and build a team of data scientists. That’s precisely what Mike Flowers has done in the Big Apple, and his team has helped to save lives and taxpayers dollars. At the Strata + Hadoop World conference held in New York in October, Flowers, the director of analytics for the Office of Policy and Strategic Planning in the Office of the Mayor of New York City, gave a keynote talk about how predictive data analytics have made city government more efficient and productive.

While the story that Flowers told is a compelling one, the role of big data in the public sector was in evidence in several other sessions at the conference. Here are three more ways that big data is relevant to the public sector that stood out from my trip to New York City.

There’s justice in the data

“Data and technology will change how criminal justice works in America,” said Anne Milgram. Milgram, the former attorney general of New Jersey, is now the vice president of criminal justice at the Arnold Foundation, where she’s working on building better risk assessment tools for courts, including a pilot in Manhattan.

“You can’t fix a problem you don’t know you don’t have,” said Migram. “We need to understand when and where violent crime is happening.” Most institutional actors, she asserted, are not using tech effectively. They need access to information other actors have and face cultural hurdles to data sharing.

When she asked a judge which criminals his court was actually putting in jails, he said it was the most dangerous and violent offenders. The data shows that the majority of people being incarcerated are low-level, non-violent offenders, said Milgram. The judges she talked with, however, couldn’t believe that outcome, given that their intention in sentencing was otherwise. What she showed them also highlighted an important system error: highly violent people were ending up out on the streets, not incarcerated.

Applying a data-driven lens to improve the efficiency and effectiveness of the criminal justice system has a potentially huge incentive: improving the bottom lines of state budgets around the country. According to Milgram, corrections ranks among the highest expenditures in state budgets. (As of 2012, corrections was No. 4 in California, for instance.) It costs $45,000 p er year in New Jersey to incarcerate someone, she said, and the cost of local systems nationally is estimated by the Department of Justice at more than $130 billion dollars a year. The costs of incarceration just prior to trial alone is $9 billion.

Milgram spoke about “moneyballing” criminal justice earlier this year at the Code for America Summit. Video of that talk is embedded below:


The returns for better applying technology in criminal justice extend far beyond reducing crime or costs to something that government officials are sworn to uphold: justice.

“It’s too expensive” is no longer an excuse

While Milgram is focused on getting a pilot up and running in New York City, Chicago’s data-driven approach to open government has been underway since Mayor Rahm Emanuel was elected in February 2011. At Strata + Hadoop World, Q Ethan McCallum, the author of the newly published Bad Data Handbook, joined Chicago chief information officer and chief data officer Brett Goldstein to talk about text mining and civic engagement.

There’s much that can and should be said about what Chicago has accomplished with data in the past years, from the launch of Open 311 in Chicago, which creates public data infrastructure for civic apps like this flu shot finder, to the city’s embrace of its civic hacking community, which creates useful apps like Was my car towed?.

What jumped out for me at Strata however, wasn’t the quality of Chicago’s data, how it has consumed it internally or the ecosystem of nonprofits, developers and startups that the city has worked to create around it. Instead, it was the importance of political leadership, results and the pursuit of internal capacity, not just the number of open datasets published online. “Mayor Emanuel wanted Chicago to be the standard for open data, analytics and prediction,” said Goldstein.

Chicago is a political place, he allowed with a laugh at the press conference at the Strata Conference, but the mandate from the mayor is to do it right. “We’re not creating pretty pictures,” he said. “We’re building a solid foundation.”

Goldstein, who added “CIO” to his title earlier this year, has spent a lot of time working on architecture since. The question has been how to bring the data together. Chicago has taken an open source approach to doing so, trying to make it universally easier to use and standardizing across the enterprise. He’s also been working with the community: Chicago is not only making municipal data available, but it’s also sponsoring R classes to help people understand how to put it to good use.

Goldstein’s team is dealing with short-term deliverables. Traditionally in IT projects, cities send out a request for proposals and then spend money on a big box solution — and that can take years. “My mandate is to give our residents every value for their taxes,” he said at the press conference. “By having an agile team, we could stand up in weeks what would take months or years. By having an agile mentality, you can get a rapid return. There are classic IT things that should go to RFP — like an ERP system — but why not build other things?”

For Goldstein, “showing that you can use R in a municipal government is a game changer.” As a result of his team’s work in Chicago, “it’s too expensive” to use big data in the public sector is no longer an excuse not to so so.

To help other cities use Hadoop, MongoDb and R, Goldstein is collaborating with Michael Flowers on G-Analytics, a group focused on building capacity in this nascent field of urban data analytics around the United States and beyond.

“I have a close relationship with Flowers,” said Goldstein, at the press conference. “We trade code. If I write something, I want someone to be able to download and use it.”

Balance public good with human rights protection

In August, my colleague Alistair Croll provocatively wrote that big data is our generation’s civil rights issue.

Robert Kirkpatrick, director of U.N. Global Pulse, broadened his concerns when he delivered remarks at Strata: “Big data is a human rights issue,” he said. “We must never analyze personally identifiable information, never analyze confidential data, and never seek to re-identify data.”

He described three big opportunities in big data for the United Nations and governments in general:

  1. better early warning, to enable faster response
  2. real-time awareness, to know what’s happening on the ground now
  3. real-time feedback, perhaps “most important,” to can see what’s not happening versus what was intended

You can view Kirkpatrick’s presentation at Slideshare. In his talk on big data and development, Kirkpatrick appealed to a packed room for help on the challenges and big questions that U.N. Global Pulse faces, a need he articulated in an op-ed in the first issue of “Big Data”, a peer-reviewed journal that launched at Strata:

How does the United Nations gain access to the data it needs in order to do the research necessary to answer those other questions? We believe the answer to the latter, crucial question is what we call “data philanthropy,” where data-rich companies donate to research projects. For example, I have been spending a lot of time lately talking to private sector companies about how they can safely and anonymously share with Global Pulse some of what they know about customers, to help give us a badly needed leg-up in our quest to better protect the vulnerable. The companies that are most open to the message are the ones that recognize that data philanthropy is not charity. These companies know that population well-being is key to the growth and continuity of business. No company wants to invest in a promising emerging market only to find out it is being threatened by a food crisis that could leave customers unable to afford products and services. And it would be sadly ironic if it turned out that expert analysis of patterns in a company’s own data could have revealed that people were headed for trouble while there was still time to act.

Kirkpatrick hopes that the data science community and corporations will donate their time, expertise and data for the public good, enabling better crisis tracking using crowdsourced information.

He’s particularly interested in the potential of social and mobile data for their predictive value. For instance, said Kirkpatrick, Twitter data accurately predicted the cholera outbreak in Haiti two weeks earlier than official records. Mobile networks can also act as drought sensors in the Sahel Desert in Africa.

The tools keep improving, too: the U.S. Geological Survey’s (USGS) Twitter earthquake detector (@USGSted) has a less than 10% false positive rate.

Such systems complement existing systems, not replace them, said Kirkpatrick. After an alert, the USGS can wake up seismologists in that part of the world to go check their data centers. By doing so, they can reduce the time to detect the epicenter of a quake from 20 minutes to four minutes.

This kind of data analysis has considerable potential for more than natural disasters or epidemics. In Jakarta, the “tweetingest city on Earth,” with more than 9 million tweets sent daily, UN Global Pulse analyzed 14 million tweets during a period of inquiry and found that people talk differently about food when they’re not getting it, leading to potential predictive value for food security. Social media can be used to predict food price inflation, though Kirkpatrick warned that signals in the data are culturally contextual. For instance, for every 5,000 more tweets from Indonesians about eggs, UN Global Pulse found a 2-3% decrease in food consumer price index.
signals in data.

UN Global Pulse is now working on building its networks around the world, partnering with government, the private sectors, agencies and academia on doing research. When they find something that works, they turn to the open source paradigm to help local populations tap into distributed intelligence.

We expect to see more map layers of proxy indicators, said Kirkpatrick. “The signals are getting stronger, with an increase in the volume of relevant conversations,” he said. “The social media food index and food price index started matching up in October 2011. We believe more people using social media to talk about basic needs is leading to a correlation with official statistics. We think there’s a huge opportunity to have socioeconomic weather stations that show trends in poverty and food in every community in the world.”

Strata Conference Santa Clara — Strata Conference Santa Clara, being held Feb. 26-28, 2013 in California, gives you the skills, tools, and technologies you need to make data work today.

Save 20% on registration with the code STRATA20

tags: , ,