The growing importance of data journalism

Parsing the progress of open government data requires new tools and reliable information sources.

One of the themes from News Foo that continues to resonate with me is the importance of data journalism. That skillset has received renewed attention this winter after Tim Berners-Lee called analyzing data the future of journalism.

When you look at data journalism and the big picture, as USA Today’s Anthony DeBarros did at his blog in November, it’s clear the recent suite of technologies is part of a continuum of technologically enhanced storytelling that traces back to computer-assisted reporting (CAR).

As DeBarros pointed out, the message of CAR “was about finding stories and using simple tools to do it: spreadsheets, databases, maps, stats,” like Microsoft Access, Excel, SPSS, and SQL Server. That’s just as true today, even if data journalists now have powerful new tools for scraping data from the web with tools like ScraperWiki and Needlebase, scripting with Perl, or Ruby, Python, MySQL and Django.

Understanding the history of computer-assisted reporting is key to putting new tools in the proper context. “We use these tools to find and tell stories,” DeBarros wrote. “We use them like we use a telephone. The story is still the thing.”

The data journalism session at News Foo took place on the same day civic developers were participating in a global open data hackathon and the New York Times hosted its Times Open Hack Day. Many developers at contests like these are interested in working with open data, but the conversation at News Foo showed how much further government entities need to go to deliver on the promise open data holds for the future of journalism.

The issues that came up are significant. Government data is often “dirty,” with missing metadata or incorrect fields. Journalists have to validate and clean up datasets with tools like Google Refine. ProPublica’s Recovery Tracker for stimulus data and projects is one of the best examples of the practice in action.

A recent gold standard for data journalism is the Pulitzer-Prize winning Toxic Waters project from the New York Times. The scale of that project makes it a difficult act to follow, though Times developers are working hard with nifty projects like Inside Congress.

You can see a visualization of the Toxic Waters project and other examples of data journalism in this Ignite presentation from News Foo.

At ProPublica, the data journalism team is conscious of deep linking into news applications, with the perspective that the visualizations produced from such apps are themselves a form of narrative journalism. With great data visualizations, readers can find their own way and interrogate the data themselves. Moreover, distinctions between a news “story” and a news “app” are dissolving as readers increasingly consume media on mobile devices and tablets.

One approach to providing useful context is the “Ion” format at ProPublica.org, where a project like “Eye on the Stimulus” is a hybrid between a blog and an application. On one side of the web page, there’s a news river. On the other, there’s entry points into the data itself. The challenge to this approach is that a media outlet needs alignment between staff and story. A reporter has to be filing every day on a running story that’s data sensitive.

Upgrading Data.gov

The data journalism News Foo session featured a virtual component, bringing City Camp founder Kevin Curry, Data.gov evangelist Jeanne Holm, and Reynolds fellow David Herzog together with News Foo participants to talk about the value propositions for open government data and data journalism.

As the recent open data report showed, developers are not finding the government data they need or want. If other entrepreneurs are to follow the lead of BrightScope, open government datasets will need to be more relevant to business. The feedback for Data.gov and other government data repositories was clear: more data, better data, and cleaner data, please.

Improving media access to data at the county- or state-level of government has structural barriers because of growing budget crises in statehouses around the United States. As Jeanne Holm observed during the News Foo session, open government initiatives will likely be done in a zero-sum budget environment in 2011. Officials have to make them sustainable and affordable.

There are some areas where the federal government can help. Holm said Data.gov has created cloud hosting that can be shared with state, local or tribal governments.Data.gov is also rolling out a set of tools that will help with data conversion, optical character recognition, and, down the road, better tools for structured data.

Those resources could make government data more readily available and accessible to the media. Kevin Curry said that data catalogs are popping up everywhere. He pointed to CivicApps in Portland, Ore., where Max Ogden’s work on coding the middleware for open government led to translating government data into more useful forms for developers.

Data journalists also run into government’s cultural challenges. It can be hard to find public information officers willing or able to address substantive questions about data. Holm said Data.gov may post more contact information online and create discussions around each dataset. That kind of information is a good start for addressing data concerns at the federal level, but fostering useful connections between journalists and data will still require improvement and effort.

Related:

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Stu

    Have you tried SmartOCR yet? It is a new software application which offers over 99.8 percent accuracy and has a very nice interface. http://smartocr.com

  • Aaron

    You have got to read up on XBRL http://goo.gl/0ZpXq

  • http://nut-a-tut.blogspot.com Nupur

    With more interactivity online, journalism must evolve to acknowledge that the reader too can share, comment, and participate in news reporting and creation. Visual representations of data will help gain eyeballs in this age of information overload to a greater extent than smart copy. Thanks- very enlightening reading about data journalism-seems like the future.

  • http://isnotworking.com Ricardo Cabral

    In case anyone might find interesting, I’ve been working for the past month at Open Data Directory which is a search engine for data sets published by governments, private companies and other organizations. It now indexes over 250K datasets.