"MapReduce" entries

Strata Week: Grabbing a slice

Digits of pi, extruding images with iPads, and mapping the past on top of the present

In this edition of Strata Week: The 2,000,000,000,000,000th digit of pi is calculated with an assist from Hadoop and MapReduce; a new technique uses iPads to extrude light paintings across a long exposure shot; Historypin links historical photos to Google Street View shots; and this is the last week for Strata Conference proposal submissions.

The SMAQ stack for big data

Storage, MapReduce and Query are ushering in data-driven products and services.

We're at the beginning of a revolution in data-driven products and services, driven by a software stack that enables big data processing on commodity hardware. Learn about the SMAQ stack, and where today's big data tools fit in.

Strata Week: The challenge of real-time analytics

Blue is the color, getting help with email overload.

In the latest edition of Strata Week: Google's introduction of a new search-indexing system highlights an important limitation of MapReduce and Hadoop. Can MapReduce adapt to real-time needs or will others follow Google in creating new architectures for real-time analytics?

Pipelining and Real-time Analytics with MapReduce Online

Some organizations create their own real-time analysis tools, while others turn to specialized solutions. In a previous post, I highlighted SQL-based real-time analytic tools that can handle large amounts of data. I noted that other big data management systems such as MPP databases and MapReduce/Hadoop were too batch-oriented to deliver analysis in near real-time. At least for MapReduce/Hadoop systems things may have changed slightly. A group of researchers from UC Berkeley and Yahoo recently modified MapReduce to allow for pipelining between operators.

HadoopDB: An Open Source Parallel Database

The growing need to manage and make sense of Big Data, has led to a surge in demand for analytic databases, which many companies are attempting to fill. As an alternative to current shared-nothing analytic databases, HadoopDB is a hybrid that combines parallel databases with scalable and fault-tolerant Hadoop/MapReduce systems.

Big Data: Technologies and Techniques for Large-Scale Data

Our belief that proficiency in managing and analyzing large amounts of data distinguishes market leading companies, led to a recent report designed to help users understand the different large-scale data management techniques. Our report on Big Data Technologies was the result of interviews with over thirty experts, including research scientists, (open-source) hackers, vendors, data analysts, and entrepreneurs. I recently sat down with my co-author, Roger Magoulas (Director of Research at O’Reilly), who agreed talk about our report and Big Data in general.