Fast data fuels real-time streaming applications

A new report describes an imminent shift in real-time applications and the data architecture they require.

Fast_data_coverThe era is here: we’re starting to see computers making decisions that people used to make, through a combination of historical and real-time data. These streams of data come together in applications that answer questions like:

  • What news items or ads is this website visitor likely to be interested in?
  • Is current network traffic part of a Distributed Denial of Service attack?
  • Should our banking site offer a visitor a special deal on a mortgage, based on her credit history?
  • What promotion will entice this gamer to stay on our site longer?
  • Is a particular part of the assembly line overheating and need to be shut down?

Such decisions require the real-time collection of data from the particular user or device, along with others in the environment, and often need to be done on a per-person or per-event basis. For instance, leaderboarding (determining who is top candidate among a group of users, based on some criteria) requires a database that tracks all the relevant users. Such a database nowadays often resides in memory.

But the decisions often require historical data as well, such as credit histories or websites visited in the past. This information comes from a traditional database doing online analytical processing (OLAP), which does such tasks as prescoring users and segmenting populations.

I recently talked to Ryan Betts, CTO of VoltDB, an in-memory database system. The company was founded with the idea that traditional OLTP databases are inefficient and slow given advances in networking, hardware, and changes transaction processing workloads. In the current age of applications — an age of mobile devices connecting people everywhere, of high bandwidth, and of large repositories of information allowing personalization — the company found that a lot of customers needed a database that combines high performance transaction processing with ability to perform the types of real time analytics mentioned above.

The combination of OLAP, streaming data collection, analytics, and operations characterizes the complex and powerful applications with which we frequently interact — and that interact with the Internet of Things. Driven by the realization that VoltDB was participating in these applications, co-founder Scott Jarr wrote the report Fast Data and the New Enterprise Data Architecture.

This evolution doesn’t mean that real-time analytics reproduce the advanced predictive analytics that might be done on the OLAP backend. The heavy processing was done beforehand. According to Betts, the pressure to produce real-time results requires application developers to keep the algorithms very simple: aggregates, counters, etc. The ability to combine the results of the OLAP analytics with the new data being produced on a near-instantaneous basis leads to an emerging generation of smart applications.

For the case of the overheating device, all the application might need to do is compare the current temperature with some threshold for that device obtained from the in-memory database. The key characteristic of the application is still audacious: to combine historical and real-time data in order to produce an immediate intervention into the surrounding world.

Fast Data and the New Enterprise Data Architecture describes the shift that has produced these new applications, the components of the systems that run the applications, the essential  steps in data analysis, and the architecture required to develop the applications — download a free copy of the report here.

This post is part of a collaboration between O’Reilly and VoltDB exploring fast and big data. See our statement of editorial independence.

tags: , , , ,