ZeroVM was the piece of technology that caught my attention during the recent Bay Area Apache Drill Meetup. What’s so interesting about another open source virtualization platform? To find out I did more reading and spoke with LiteStack founder, Camuel Gilyadov.
ZeroVM has its roots in the OpenDremel project. Camuel and his team needed a lightweight virtualization framework but couldn’t find one that suited their requirements for OpenDremel. They created ZeroVM and along the way addressed issues relevant to cloud applications, including security, multi-tenancy, and instant1 elasticity. I’m not claiming ZeroVM is mature technology, but there are two potential applications that data scientists will like:
Converged storage in the cloud
The amount of time it takes to transfer data between two specialized clusters has led to storage systems with compute capabilities2. A recent example is storage vendor CleverSafe including Hadoop MapReduce into its dispersed storage network. Users of Hadoop MapReduce who have played with cloud computing are familiar with this issue: performing big data analysis in the cloud usually means having to first transfer data from storage systems (S3) to compute resources (EC2). This means that if lowering latency is an issue, bandwidth and data size limits what you can do. In contrast (assuming cloud services providers install it) ZeroVM lets you perform computations on the storage cluster!
Lightweight VM for cloud applications and analytics
Leaving aside its use of the term “virtualization,” ZeroVM is in some ways more akin to the JVM and CLR than virtualization software like VMWare and Xen. It is a lightweight VM that takes programs written in C/C++ and popular scripting languages, and runs them (in parallel) in the cloud. Since ZeroVM is only a few hundred kilobytes, it can be provisioned quickly (“sub-seconds”)3. Thus for many apps ZeroVM is a lightweight option and alternative to full-blown operating systems. In particular one can provision multiple ZeroVMs running specialized analytic tasks.
(1) ZeroVM can be provisioned much faster than current VM’s.
(2) If you search for “Converged Storage” you get over 300,000 hits.
(3) Fine-grained metering (as opposed to hourly) becomes possible.