Strata Week: Data prospecting with Kaggle

Kaggle now accepting data before a contest, HP's Autonomy purchase comes into focus, Cloudera's new Hadoop distribution.

Here are a few of the data stories that caught my attention this week:

Prospecting for data

KaggleThe data science competition site Kaggle is extending its features with a new service called Prospect. Prospect allows companies to submit a data sample to the site without having a pre-ordained plan for a contest. In turn, the data scientists using Kaggle can suggest ways in which machine learning could best uncover new insights and answer less-obvious questions — and what sorts of data competitions could be based on the data.

As GigaOm’s Derrick Harris describes it: “It’s part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it’s actually more like a prequel to Kaggle’s flagship predictive modeling competitions than it is a sequel.” It’s certainly a good way for companies to get their feet wet with predictive modeling.

Practice Fusion, a web-based electronic health records system for physicians, has launched the inaugural Kaggle Prospect challenge.

HP’s big data plans

Last year, Hewlett Packard made a move away from the personal computing business and toward enterprise software and information management. It’s a move that was marked in part by the $10 billion it paid to acquire Autonomy. Now we know a bit more about HP’s big data plans for its Information Optimization Portfolio, which has been built around Autonomy’s Intelligent Data Operating Layer (IDOL).

ReadWriteWeb’s Scott M. Fulton takes a closer look at HP’s big data plans.

The latest from Cloudera

Cloudera released a number of new products this week: Cloudera Manager 3.7.6; Hue 2.0.1; and of course CDH 4.0, its Hadoop distribution.

CDH 4.0 includes:

“… high availability for the filesystem, ability to support multiple namespaces, HBase table and column level security, improved performance, HBase replication and greatly improved usability and browser support for the Hue web interface. Cloudera Manager 4 includes multi-cluster and multi-version support, automation for high availability and MapReduce2, multi-namespace support, cluster-wide heatmaps, host monitoring and automated client configurations.”

Social data platform DataSift also announced this week that it was powering its Hadoop clusters with CDH to perform the “Big Data heavy lifting to help deliver DataSift’s Historics, a cloud-computing platform that enables entrepreneurs and enterprises to extract business insights from historical public Tweets.”

Have data news to share?

Feel free to email us.

OSCON 2012 Data Track — Today’s system architectures embrace many flavors of data: relational, NoSQL, big data and streaming. Learn more in the Data track at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR

Related:

tags: , , , ,