How ZeroVM changes analytics in the cloud

What's so interesting about another open source virtualization platform?

ZeroVMZeroVM was the piece of technology that caught my attention during the recent Bay Area Apache Drill Meetup. What’s so interesting about another open source virtualization platform? To find out I did more reading and spoke with LiteStack founder, Camuel Gilyadov.

ZeroVM has its roots in the OpenDremel project. Camuel and his team needed a lightweight virtualization framework but couldn’t find one that suited their requirements for OpenDremel. They created ZeroVM and along the way addressed issues relevant to cloud applications, including security, multi-tenancy, and instant1 elasticity. I’m not claiming ZeroVM is mature technology, but there are two potential applications that data scientists will like:

Converged storage in the cloud

The amount of time it takes to transfer data between two specialized clusters has led to storage systems with compute capabilities2. A recent example is storage vendor CleverSafe including Hadoop MapReduce into its dispersed storage network. Users of Hadoop MapReduce who have played with cloud computing are familiar with this issue: performing big data analysis in the cloud usually means having to first transfer data from storage systems (S3) to compute resources (EC2). This means that if lowering latency is an issue, bandwidth and data size limits what you can do. In contrast (assuming cloud services providers install it) ZeroVM lets you perform computations on the storage cluster!

Lightweight VM for cloud applications and analytics

Leaving aside its use of the term “virtualization,” ZeroVM is in some ways more akin to the JVM and CLR than virtualization software like VMWare and Xen. It is a lightweight VM that takes programs written in C/C++ and popular scripting languages, and runs them (in parallel) in the cloud. Since ZeroVM is only a few hundred kilobytes, it can be provisioned quickly (“sub-seconds”)3. Thus for many apps ZeroVM is a lightweight option and alternative to full-blown operating systems. In particular one can provision multiple ZeroVMs running specialized analytic tasks.


(1) ZeroVM can be provisioned much faster than current VM’s.

(2) If you search for “Converged Storage” you get over 300,000 hits.

(3) Fine-grained metering (as opposed to hourly) becomes possible.

Strata Conference + Hadoop World — The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.

Save 20% on registration with the code RADAR20

tags: , , ,