Strata Week: Movers and shakers on the data journalism front

Reuters' Connected China, accessing Pew's datasets, Simon Rogers' move to Twitter, data privacy solutions, and Intel's shift away from chips.

Reuters launches Connected China, Pew instructs on downloading its data, and Twitter gets a data editor

Yue Qiu and Wenxiong Zhang took a look this week at a data journalism effort by Reuters, the Connected China visualization application. Qiu and Zhang report that “[o]ver the course of about 18 months, a dozen bilingual reporters based in Hong Kong dug into government websites, government reports, policy papers, Mainland major publications, English news reporting, academic texts, and think-tank reports to build up the database.”

The five parts of the interactive visualization — China 101, Social Power, Institutional Power, Career Comparison, and Feature Stories — emerged over the course of the team’s investigation, Qiu and Zhang note, as they started to realize “it would be interesting and useful to focus the project on the interpersonal relationships among Chinese political leaders.” Using daily reporting to build the database was crucial; Reg Chua, chief editor of the Connected China project and data and innovation editor of Reuters, told Qiu and Zhang, “If we could capture the information in some structured way, not just keeping archives, not just digitizing your notebook, but finding a way to construct it as a database, we could use that information much more valuably for a much longer period of time.”

The Reuters group teamed up with Fathom Information Design to design the final visualization. You can read more about the project in Qiu and Zhang’s report at Columbia Journalism Review, and you can access the visualization on the Connected China website.

In related data journalism news this week, the Scott Keeter, director of survey research for the Pew Research Center, published instructions this week on how to access Pew research data, noting that “[n]early all of the survey and other quantitative data collected by the Pew Research Center’s seven projects are freely available for secondary analysis by researchers.” Keeter recommends that researchers who want to use Pew datasets have experience with SPSS, SAS or STATA software and notes that most of the Pew files are saved in .sav format.

In additional related news, those who work or are interested in data journalism may recognize Simon Rogers’ name — he’s been a journalist at the Guardian since 1998, and since 2009, he’s been the editor of the Guardian’s datablog and datastore. This week, he’s announced he will leave the Guardian to take the first data editor position at Twitter. In a post on his personal blog, Rogers reminisces on his time at the Guardian and looks forward to his new role:

“Twitter has become such an important element in the way we work as journalists. It’s impossible to ignore, and increasingly at the heart of every major event, from politics to sport and entertainment. As data editor, I’ll be helping to explain how this phenomenon works.”

In an interview with O’Reilly’s Alex Howard, Rogers said he’s looking forward to the new challenge and working with large volumes of social media data: “Twitter is an amazing phenomenon,” he told Howard. “It’s changed every level of how we work as reporters. We really saw that during the ‘Reading the Riots‘ project. There we had 1.6 million riot-related tweets which Twitter gave us to analyze. … Mark Twain said ‘a lie can be halfway around the world before the truth has got its boots on.’ All social media encourages that. I think the work we did with the riot tweets shows how the truth can catch up fast.”

The best data privacy, data sharing solutions are yet to come

As companies work to find more and more ways to mine and collect consumer data, solutions are emerging to help consumers keep more of their data private, but is hiding data the best solution? GigaOm’s Jordan Novet looks this week at data privacy options, such as Abine’s DoNotTrackMe extension and PrivacyChoice’s Privacyfix app.

The trouble with “opting out of the data revolution,” Novet notes, is that it limits the services and responsive features sites can provide. “[W]ithout location information, Google Now would be considerably less powerful,” Novet notes. Ownership is another option Novet highlights, such as setting up personal data lockers for consumers to store their data to dole out in exchange for money or other rewards.

Novet points out that neither route — data ownership or data masking — are ideal solutions, but that given time, better solutions that address both company desire to personally target consumers and consumer desire to protect their privacy are sure to emerge. You can read his full report at GigaOm.

Intel sees the future in networks

Intel announced this week that it will buy API managment company Mashery. Owen Thomas notes at ReadWrite that the two companies have a history of working together and that the acquisition will allow Intel to expand its customer base; Mashery’s API management tools will allow Intel to offer a more “complete and credible” cloud-computing infrastracture, as “[m]ost cloud-software services communicate with other services via APIs.” And while Intel may sell chips for the servers running the software, Thomas writes, “what’s more important is the notion that Intel has a product offering that speaks to innovative startups, not just struggling PC manufacturers.”

Intel’s move beyond chips has broader implications, too, Thomas points out. “[I]t signals Intel’s recognition that the central processing unit is no longer a silicon chip. It is the network.” The deal is expected to close in the second quarter. You can read more from Thomas’ report at ReadWrite.

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

O’Reilly
Strata Conference
— Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.

Strata Rx Health Data Conference: September 25-27 | Boston, MA
Strata + Hadoop World: October 28-30 | New York, NY
Strata in London: November 15-17 | London, England

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.