"fast data" entries

Real-time, not batch-time, analytics with Hadoop

How big data, fast data, and real-time analytics work together in the real world.

Mines_British_Library_Flickr

Attend the VoltDB webcast on June 24, 2015 with John Hugg to learn more on how to build a fast data front-end to Hadoop.

Today, we often hear the phrase “The 3 Vs” in relation to big data: Volume, Variety and Velocity. With the interest and popularity of big data frameworks such as Hadoop, the focus has mostly centered on volume and data at rest. Common requirements here would be data ingestion, batch processing, and distributed queries. These are well understood. Increasingly, however, there is a need to manage and process data as it arrives, in real time. There may be great value in the immediacy of that data and the ability to act upon it very quickly. This is velocity and data in motion, also known as “fast data.” Fast data has become increasingly important within the past few years due to the growth in endpoints that now stream data in real time.

Big data + fast data is a powerful combination. However, adding real-time analytics to this mix provides the business value. Let’s look at a real example, originally described by Scott Jarr of VoltDB.

Consider a company that builds systems to manage physical assets in precious metal mines. Inside a mine, there are sensors on miners as well as shovels and other assets. For a lost shovel, minutes or hours of reporting latency may be acceptable. However, a sensor on a miner indicating a stopped heart should require immediate attention. The system should, therefore, be able to receive very fast data. Read more…

Fast data fuels real-time streaming applications

A new report describes an imminent shift in real-time applications and the data architecture they require.

Fast_data_coverThe era is here: we’re starting to see computers making decisions that people used to make, through a combination of historical and real-time data. These streams of data come together in applications that answer questions like:

  • What news items or ads is this website visitor likely to be interested in?
  • Is current network traffic part of a Distributed Denial of Service attack?
  • Should our banking site offer a visitor a special deal on a mortgage, based on her credit history?
  • What promotion will entice this gamer to stay on our site longer?
  • Is a particular part of the assembly line overheating and need to be shut down?

Such decisions require the real-time collection of data from the particular user or device, along with others in the environment, and often need to be done on a per-person or per-event basis. For instance, leaderboarding (determining who is top candidate among a group of users, based on some criteria) requires a database that tracks all the relevant users. Such a database nowadays often resides in memory. Read more…