Here are a few stories that caught my attention in the data space this week.
Big data, Big Brother, big problems
Adam Frank took a look at some of the big problems with big data this week over at NPR. Franks addresses issues in analyzing the sheer volume of complex information inherent in big data. Learning to sort through and mine vasts amounts of data to extrapolate meaning will be a “trick,” he writes, but it turns out the big problems with big data go deeper than volume.
Creating computer models to simulate complex systems with big data, Franks notes, ultimately creates something a bit different from reality: “the very act of bringing the equations over to digital form means you have changed them in subtle ways and that means you are solving a slightly different problem than the real-world version.” Analysis, therefore, “requires trained skepticism, sophistication and, remarkably, some level of intuition about the systems we study,” he writes.
Franks also raises the problem of big data becoming a threat to individuals within society:
“Everyday we are scattering ‘digital breadcrumbs’ into the data-verse. Credit card purchases, cell phone calls, Internet searches: Big Data means memory storage has become so cheap that all data about all those aspects of our lives can be harvested and put to use. And it’s exactly the use of all that harvested data that can pose a threat to society.”
The threat comes from the Big Brother aspect of being constantly monitored in ways we’ve never before imagined, and Franks writes, “It may also allows levels of manipulation that are new and truly unimaginable.” You can read more of Franks thoughts on what it means to live in the age of big data here. (We’ve covered related ethics issues with big data here on Strata.)
Monetizing open data
Ville Peltola, IBM’s innovation chief in Finland, told Meyer the situation is becoming frustrating, that he doesn’t understand why it’s so hard to properly open up data, or even just some of it. “You could have bronze, silver and gold APIs, where more data costs more,” Peltola said to Meyer. “It’s like a drug dealer. Maybe you have to solve this chicken-and-egg problem by giving samples of raw data.”
Meyer points out the real issue inherent in what Peltola is saying: “that large amounts of data are very valuable, and the companies that create them tend not to know how to realise the greatest value from them.” Peltola had an interesting idea to address this: “What if you have an internal start-up in your company tasked only with monetising your data?”
Chris Taggart, co-founder of OpenCorporates, made a more competitive argument for opening up your company’s data: it “exposes your competitors’ internal contradictions” and might inspire disruption, he told Meyer — “Most big, fat secure companies don’t have the confidence to disrupt themselves,” he said.
Hortonworks’ strategy, one year later
As Hortonworks marks its first year after spinning out of Yahoo, Doug Henschen at InformationWeek checked in to find out how the company is managing to establish itself as an innovator when it gives everything freely to the open source community. As Hortonworks CEO Rob Bearden told Henschen, Hortonworks “holds nothing back,” which allows its competitors to capitalize on its best work.
Asked how the company will manage to stand apart, Bearden said, “We’re very focused on making sure the right enterprise functionality gets into the core platform and that we are the more reliable, stable platform on the planet” — which Henschen notes is a less-than-subtle slam against Cloudera’s release of Hadoop 2.0 software with its CDH4 distribution.
As to reaping the benefits of its innovative work, Bearden told Henschen that “ultimately, the market will return to its rightful owner,” and that “[t]he ecosystem recognizes who’s innovating, and now that we have technology out there, we’re seeing the migration very rapidly.”
A new Hadoop Partner Ecosystem
“The new group includes IBM (which continues the Hadoop efforts the pair announced in October 2011), Cloudera and Hortonworks. Jaspersoft already listed the three companies as technology partners, but the new relationship status indicates Jaspersoft’s BI suite will feature a much tighter integration with the IBM, Hortonworks and Cloudera Hadoop distributions. Jaspersoft will also integrate with Informatica’s HParser, and Talend’s Open Studio for Big Data.”
Jaspersoft also released results from its Big Data Survey of its community members. Fifty-nine percent of respondents indicated they “either already deployed a Hadoop-based big data solution or are currently in development.” Additionally, 77% of Hadoop user respondents indicated they’d be deploying big data solutions in the next six months. You can read more about the Hadoop Partner Ecosystem and see more survey results in the company press release.
Tip us off
News tips and suggestions are always welcome, so please send them along.