Strata Week: Investors circle big data

Big funding news for data startups, a new verification tool for Wikipedia, and Angry Birds takes down the economy.

This was a busy week for data stories. Here are a few that caught my attention:

Big money for big data

Opera SolutonsThere’s recently been a steady stream of funding news for big data, database, and data mining companies. Last Thursday, Hadoop-based data analytics startup Platfora raised $5.7 million from Andreessen Horowitz. On Monday, 10gen announced it had raised $20 million for MongoDB, its open-source, NoSQL database. On Tuesday, Xignite said it had raised $10 million to build big data repositories for financial organizations; data storage provider Zetta announced a $9 million round; and Walmart announced it had acquired the ad targeting and data mining startup OneRiot (the terms of the deal were not disclosed). Finally, yesterday, big data analytics company Opera Solutions announced that it had raised a whopping $84 million in its first round of funding.

GigaOm’s Derrick Harris offers the story behind Opera Solution’s massive round of funding, noting that the company was already growing fast and doing more than $100 million per year in revenue. He also points to the company’s penchant for hiring PhDs (90 so far), “something that makes it more akin to blue-chipper IBM than to many of today’s big data startups pushing Hadoop or NoSQL technologies.” Harris also notes that at a half-billion-dollar valuation and with 600-plus employees, Opera Solutions isn’t a great acquisitions target for other big companies, even those wanting to beef up their analytics offerings. He contends this could allow Opera Solutions to remain independent and perhaps make some acquisitions of its own.

Ushahidi and Wikipedia team up for WikiSweeper

Wikipedia and UshahidiThe crisis-mapping platform Ushahidi unveiled a new tool this week to help Wikipedia editors track changes and verify sources on articles. The project, called WikiSweeper, is aimed at those highly- and rapidly-edited articles that are associated with major events.

As Ushahidi writes on its blog:

When a globally-relevant news story breaks, relevant Wikipedia pages are the subject of hundreds of edits as events unfold. As each editor looks to editing and maintaining the quality and credibility of the page, they need to manually track the news cycle, each using their own spheres of reference. The decisions that are made to accept one source while rejecting others remains opaque, as are the strategies that editors develop to alert and keep track of the latest information coming in from a variety of different sources.

WikiSweeper is based on Ushahidi’s own open-source Sweeper tool, and its application to Wikipedia will help Ushahidi in turn build out its own project. After all, during major events, information comes in from multiple sources at a breakneck pace, and in crisis response, the accuracy and trustworthiness of the sources need to be quickly and transparently identified. As Ushahidi points out, this makes it a “win-win” for both organizations as they gain better tools for dealing with real-time news and social data.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Angry Birds take down pigs and the economy

Invoking the seasonal declarations come March about the amount of time Americans waste at work watching the NCAA college basketball tournament, The Atlantic’s Alexis Madrigal has pointed to a far more insidious and year-round problem: the amount of hours American workers lose by playing Angry Birds.

Drawing on data about the number of minutes people spend playing Angry Birds per day — 200 million — Madrigal has calculated the resulting lost hours and lost wages. He estimates about 43,333,333 on-the-clock hours are spent playing Angry Birds each year, accounting for $1.5 billion in lost wages per year.

Obviously there are some really big assumptions in this calculation. The first is that five percent of the total Angry Bird hours are played by Americans at work … we don’t know the international breakdown, nor do we know how often people play at work. But, five percent seemed like a reasonable assumption. Second, the Pew income data for smartphone ownership is not that precise, particularly on the upper ($75,000+) and lower (less than $30,000) ends. I had to pick numbers, so I basically split Americans up into four categories: people earning $30,000, $50,000, $75,000, and $100,000, then I calculated simple hourly wages for those groups (income/52/40) and did a weighted average based on smartphone adoption in those categories. The $35 per hour number I used is comparable with the $38 that Challenger, Gray, and Christmas used for fantasy sports players. But this is certainly a rough approximation. Put it this way: I bet this estimate is right to the order of magnitude, if not in the details.

Take that, Gladwell

Malcolm Gladwell raised the ire of many social-media-savvy activists last year by claiming that “the revolution will not be tweeted.” Writing in The New Yorker, Gladwell dismissed social media as a tool for change. He argued that bonds formed online are “weak” and unable to withstand the sorts of demands necessary for social change.

Gladwell’s assertions have been countered in many places, and a new article analyzing social media’s role in the Arab Spring takes the rebuttals to a new level.

“After analyzing over 3 million tweets, gigabytes of YouTube content and thousands of blog posts, a new study finds that social media played a central role in shaping political debates in the Arab Spring. Conversations about revolution often preceded major events on the ground, and social media carried inspiring stories of protest across international borders,” the authors write.

The authors describe their research methodology for extracting and analyzing the texts from blogs and tweets, but also lamented some of the problems they faced, particularly with access to the Twitter archive.

Got data news?

Feel free to email me.

Related:

tags: , , , , ,