- Big Data: the Big Picture (Vimeo) — Jim Stogdill’s excellent talk: although Big Data is presented as part of the Gartner Hype Cycle, it’s an epoch of the Information Age which will have significant effects on the structure of corporations and the economy.
- Impala (github) — Cloudera’s open source (Apache) implementation of Google’s F1 (PDF), for realtime queries across clusters. Impala is different from Hive and Pig because it uses its own daemons that are spread across the cluster for queries. Furthermore, Impala does not leverage MapReduce, allowing Impala to return result in real-time. (via Wired)
- druid (github) — open source (GPLv2) a distributed, column-oriented analytical datastore. It was originally created to resolve query latency issues seen with trying to use Hadoop to power an interactive service. See also the announcement of its open-sourcing.
- Supersonic (Google Code) — an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs. It is designed to work in a single process. Apache-licensed.
ENTRIES TAGGED "cloudera"
Big Data's Big Picture, Real-Time Queries, Real-Time Queries, Single-Process Real-Time Queries
Kaggle now accepting data before a contest, HP's Autonomy purchase comes into focus, Cloudera's new Hadoop distribution.
In this week's data news, Kaggle launches Prospect, HP unveils its big data plans, and Cloudera releases CDH4 (the latest version of its Hadoop distribution).
A proposal for a .data TLD, flavors of Hadoop, and a vote for pseudonymous commenters.
In this week's data news, Stephen Wolfram calls for a .data top-level domain and Cloudera responds to Hadoop version 1.0.
MapReduce gets easier, a new search engine for data, and now you can monitor the universe's forces on your phone.
Cloudera's Crunch hopes to make MapReduce easier, Datafiniti launches a search engine for data, and the University of Oxford releases an Android app for monitoring CERN data.
Oracle unveils its big data appliance, the Hadoop community gauges contributions.
In this week's data news, Oracle unveils its big data strategy, and Cloudera looks at the contributions to the Hadoop core and community.
Get control over cloud resources
The cloud makes clusters easy, but for rapid prototyping purposes, bringing up clusters still involves quite a bit of effort. The Whirr project makes cloud control simple.
Cloud Data Ingest, Microbiome, Patent Blindness, Russian Open Data
- Flume — Cloudera open source project to solve the problem of how to get data into cloud apps, from collection to processing to storage. Flume is a distributed service that makes it very easy to collect and aggregate your data into a persistent store such as HDFS. Flume can read data from almost any source – log files, Syslog packets, the standard output of any Unix process – and can deliver it to a batch processing system like Hadoop or a real-time data store like HBase. All this can be configured dynamically from a single, central location – no more tedious configuration file editing and process restarting. Flume will collect the data from wherever existing applications are storing it, and whisk it away for further analysis and processing. (via mikeolson on Twitter)
- How Microbes Defend and Define Us (NYTimes) — there’s been a lot of talk about the microbiome at Sci Foo in the last few years, now it’s bubbling out into the world. Turns out that “bacteria bad, megafauna good” is as simplistic and inaccurate as “Muslim bad, Christian good”. Fancy that. (via Jim Stogdill)
- Startup Model Patently Flawed (Nature) — “There is a lot of stuff that academics are realizing isn’t patentable but they can commercialize for themselves by starting a company,” says Scott Shane, an economist at Case Western Reserve University in Cleveland, Ohio, and a co-author of the study. Because surveys of entrepreneurial activity — including government assessments — typically focus on patent activity, they may be significantly underestimating academics’ efforts, he notes. (via pkedrosky on Twitter)
- Open Data on Russian Government Spending (OKFN) — a group outside government is adding analytics to the data that departments are required to release.