Integration of the data supply chain is key to a reliable and fast big data analytics deployment.
Watch our free webcast “Accelerating Advanced Analytics with Spark” to learn about the architecture, applications, and best practices of Apache Spark.
Apache Hadoop is a mature development framework, which coupled with its large ecosystem, and support and contributions from key players such as Cloudera, Hortonworks, and Yahoo, provides organizations with many tools to manage data of varying sizes.
In the past, Hadoop’s batch-oriented nature using MapReduce was sufficient to meet the processing needs of many organizations. However, increasing demands for faster processing of data have emerged. These demands have been driven by recent developments in streaming technologies, the Internet of Things (IoT) and real-time analytics, to name just a few. These new demands have required new processing models. One significant new technology today that is being used to meet these demands and is gaining considerable interest and widespread support is Apache Spark. Spark’s speed and versatility make it a key part of today’s big-data processing stack in industries from energy to finance. Read more…
How big data, fast data, and real-time analytics work together in the real world.
Today, we often hear the phrase “The 3 Vs” in relation to big data: Volume, Variety and Velocity. With the interest and popularity of big data frameworks such as Hadoop, the focus has mostly centered on volume and data at rest. Common requirements here would be data ingestion, batch processing, and distributed queries. These are well understood. Increasingly, however, there is a need to manage and process data as it arrives, in real time. There may be great value in the immediacy of that data and the ability to act upon it very quickly. This is velocity and data in motion, also known as “fast data.” Fast data has become increasingly important within the past few years due to the growth in endpoints that now stream data in real time.
Big data + fast data is a powerful combination. However, adding real-time analytics to this mix provides the business value. Let’s look at a real example, originally described by Scott Jarr of VoltDB.
Consider a company that builds systems to manage physical assets in precious metal mines. Inside a mine, there are sensors on miners as well as shovels and other assets. For a lost shovel, minutes or hours of reporting latency may be acceptable. However, a sensor on a miner indicating a stopped heart should require immediate attention. The system should, therefore, be able to receive very fast data. Read more…