"in-memory data" entries
In-memory data management brings data close to the computation.
We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have emerged, some expected, others surprising.
Performance improves as you put data as close to the computation as possible. We talked to people in systems, data management, web applications, and scientific computing who have embraced this concept. Some solutions go to the the lowest level of hardware (L1, L2 cache), The next generation SSDs will have latency performance closer to main memory, potentially blurring the distinction between storage and memory. For performance and power consumption considerations we can imagine a future where the primary way systems are sized will be based on the amount of non-volatile memory* deployed.
Putting data in-memory does not negate the importance of distributed computing environments. Data size and the ability to leverage parallel environments are frequently cited reasons. The same characteristics that make the distributed environments compelling also apply to in-memory systems: fault-tolerance and parallelism for performance. An additional consideration is the ability to gracefully spillover to disk when main is memory full. Read more…
We're launching an investigation into in-memory data technologies.
In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.
An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.
Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).
We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post.