Here are a few of the data stories that caught my attention this week.
Crowdsourcing and gaming helps in the fight against HIV
Players of the online protein-folding game Fold.it have solved a scientific problem in three weeks’ time that has stumped researchers for more than a decade. Scientists have been trying to figure out the structure of a protein-cutting enzyme from an AIDS-like virus, but failing to do so, turned the information over to Fold.it players, challenging them to see if they could produce an accurate model.
“We wanted to see if human intuition could succeed where automated methods had failed,” Dr. Firas Khatib of the University of Washington Department of Biochemistry told Science Daily. And indeed, it did.
The goal was to work out the three-dimensional structure of different proteins. Players, most of whom were not trained scientists, competed with one another and were scored based on the stability of what they built. But they could also work together on solving the various puzzles. And, in this case, by playing Fold.it, the gamers generated models that were good enough for the researchers to determine the enzyme’s actual structure. This included elements that could be targeted by drugs that could take on the enzyme.
Twitter open sources Storm and acquires Julpan
As Twitter indicated it would do last month, the company has open sourced Storm, its Hadoop-like, real-time data processing tool. Storm was developed by Backtype, which Twitter acquired earlier this year, and Twitter engineer Nathan Marz, formerly the lead engineer at Backtype, made the open source release official at the Strange Loop developer conference. Along with the code, there’s extensive documentation of the project, as well as other resources Marz lists on a Hacker News thread about the project.
The open sourcing of Storm wasn’t the only data news from Twitter this week. The company has also acquired Julpan, a New York City-based startup that analyzes real-time data collected from the social web.
The acquisition is the latest in a series of moves by Twitter to build out its own analytics capabilities — moves that include the acquisition of BackType — to analyze the more than 200 million Tweets that are now posted per day.
Julpan is headed by former Google data scientist Ori Allon. Allon built “Orion,” a search algorithm that became a key part of Google’s search relevancy efforts when the company acquired the rights to it in 2006. Allon left Google in 2010 to found Julpan.
The politics of search
Google Chairman Eric Schmidt testified before the Senate Judiciary Subcommittee on Antitrust, Competition Policy and Consumer Rights yesterday — a hearing that GigaOm’s Stacey Higginbotham said demonstrated a “fundamental conflict of cultures” between Silicon Valley and Washington DC.
The purpose of the Senate hearing is to investigate Google’s search practices and to ascertain whether or not Google’s dominance over search and search advertising warrants an anti-trust response from the government. As Senator Patrick Leahy put it, the hearings are meant to see whether “Google is in a position to determine who will succeed and fail on the Internet.” Many of the questions from the senators involved how Google handles search ranking. Senator Mike Lee accused the company of cooking search results, an accusation that Schmidt denied.
“First, we built search for users, not websites,” Schmidt testified. “And no matter what we do, there will always be some websites unhappy with where they rank. Search is subjective, and there’s no ‘correct’ set of search results. Our scientific process is designed to provide the answers that consumers will find most useful.”
GigaOm’s Higginbotham describes what she sees as a clash of cultures between the Senators and Google — and between politics and algorithms — thusly:
Schmidt, like any computer scientist, tried to argue that the algorithms do what they are supposed to do. From a computer science view, if an algorithm is fair, then changing to protect a certain class of those affected by it makes it fundamentally unfair to others (something Congress routinely does with exceptions and carve outs when it’s making legislation). In fact, the biggest elephant in the room was a clash of cultures between the Silicon Valley culture of the free market — and using technology to create a better consumer experience — and Washington D.C.’s inherent cynicism and pandering to constituents.
Strata Conference in New York
Got data news?
Feel free to email me.