Strata Week: Data Without Borders

Work on data projects that matter, data journalism, and a social graph of the Marvel universe.

Here are some of the data stories that caught my attention this week:

Data without borders

Data without bordersData is everywhere. That much we know. But the usage of and benefit from data is not evenly distributed, and this week, New York Times data scientist Jake Porway has issued a call to arms to address this. He’s asking for developers and data scientists to help build a Data Without Borders-type effort to take data — particularly NGO and non-profits’ data — and match it with people who know what to do with it.

As Porway observes:

There’s a lot of effort in our discipline put toward what I feel are sort of “bourgeois” applications of data science, such as using complex machine learning algorithms and rich datasets not to enhance communication or improve the government, but instead to let people know that there’s a 5% deal on an iPad within a 1 mile radius of where they are. In my opinion, these applications bring vanishingly small incremental improvements to lives that are arguably already pretty awesome.

Porway proposes building a program to help match data scientists with non-profits and the like who need data services. The idea is still under development, but drop Porway a line if you’re interested.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Big data and the future of journalism

ScraperWikiThe Knight Foundation announced the winners of its Knight News Challenge this week, a competition to find and support the best new ideas in journalism. The Knight Foundation selected 16 projects to fund from among hundreds of applicants.

In announcing the winners, the Knight Foundation pointed out a couple of important trends, including “the rise of the hacker/data journalist.” Indeed, several of the projects are data-related, including Swiftriver, a project that aims to make sense of crisis data; ScraperWiki, a tool for users to create their own custom scrapers; and Overview, a project that will create visualization tools to help journalists better understand large data sets.

IBM releases it first Netezza appliance

Last fall, IBM announced its acquisition of the big data analytics company Netezza. The acquisition was aimed at helping IBM build out its analytics offerings.

This week, IBM released its first new Netezza appliance since acquiring the company. The IBM Netezza High Capacity Appliance is designed to analyze up to 10 petabytes in just a few minutes. “With the new appliance, IBM is looking to make analysis of so-called big data sets more affordable,” Steve Mills, senior vice president and group executive of software and systems at IBM, told ZDNet.

The new Netezza appliance is part of IBM’s larger strategy of handling big data, of which its recent success with Watson on Jeopardy was just one small part.

The superhero social graph

MarvelPlenty of attention is paid to the social graph: the ways in which we are connected online through our various social networks. And while there’s still lots of work to be done making sense of that data and of those relationships, a new dataset released this week by the data marketplace Infochimps points to other social (fictional) worlds that can be analyzed.

The world, in this case, is that of the Marvel Comics universe. The Marvel dataset was constructed by Cesc RossellĂł, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Much like a real social graph, the data shows the relationships between characters, and according to the researchers “is closer to a real social graph than one might expect.”

Got data news?

Feel free to email me.

Related:

tags: , , , , ,