• Print

On the performance of clouds

A study ran cloud providers through four tests. Here's some of the results.

Velocity 2010

Public clouds are based on the economics of sharing. Cloud providers can charge less, and sell computing on an hourly basis without long-term contracts, because they’re spreading costs and skills across many customers.

But a shared model means that your application is competing with other users’ applications for scarce resources. The pact you’re making with a public cloud, for better or worse, is that the advantages of elasticity and pay-as-you-go economics outweigh any problems you’ll face.

Enterprises are skeptical because clouds force them to relinquish control over the underlying networks and architectures on which their applications run. Is performance acceptable? Will clouds be reliable? What’s the tradeoff, particularly now that we know speed matters so much?

We (Bitcurrent) decided to find out. With the help of Webmetrics, we built four test applications: a small object, a large object, a million calculations, and a 500,000-row table scan. We ported the applications to five different clouds, and monitored them for a month. We discovered that performance varies widely by test type and cloud:

cloud performance results

Here are some of the lessons learned:

  • All of the services handled the small image well.
  • PaaS clouds were more efficient at delivering the large object, possibly because of their ability to distribute workload out to caching tiers better than an individual virtual machine can do.
  • Force.com didn’t handle CPU workloads well, even with a tenth of the load of other agents. Amazon was slow for CPU, but we were using the least-powerful of Amazon’s EC2 machines.
  • Google’s ability to handle I/O, even under heavy load, was unmatched. Rackspace also dispatched the I/O tests quickly. Then again, it took us 37 hours to insert the data into Google’s Bigtable.

In the end, it’s clear that there’s no single “best” cloud: PaaS (App Engine, Force.com) scales easily, but locks you in; IaaS (Rackspace, Amazon, Terremark) offers portability, but leaves you doing all the scaling work yourself.

The full 50-page report is available free from Webmetrics.


Web performance and cloud architecture will be key topics at this week’s Velocity conference.

tags: , , ,
  • Alice Baile

    Listen in this brief video how the US Air Force is seeking to control cyberspace.

    http://www.youtube.com/watch?v=xeWnZRZrpaY

    Now apply the concepts discussed in this video to society on the whole.

    Migrating ones data to the “cloud” is being heavily promoted in order to facilitate this increased control, as manifested via control over the OODA Loop, where the cloud drives the first “O” in the acronym, otherwise known as “Observation”.

    Researching the development of John Boyd’s work in OODA Loop theory will reveal the use of “Cloud” terminology and how this concept fits within the larger theoretical military framework being applied across all of society.

  • k

    Can’t wait for AppEngine to support Go!

    Python is OK, but for some things it really would help to have a more ‘direct’ language, and the Go type system is beautiful.

  • John

    Why didn’t you test Microsft Windows Azure Platform?
    More interesting info here:
    http://www.thinclient.org/archives/2009/08/cloud_computing_1.html

  • Jose Simoes

    Google is first in all test except one where is second.

    What did you ,mean “it’s clear that there’s no single best cloud”

    Jose Simoes

  • Jonas – hosting4developers.com

    @Jose: While Google may be first in the tests, one of the problems with App Engine is that it locks you down (you need to write code specifically for App Engine). Whether or not that is a big issue depends on your project. I use App Engine myself (as well as The Rackspace Cloud), so I’m not against it at all, but it’s important to be aware that there’s a lockdown.

  • Ray

    Looking at your Amazon result I’m assuming you didn’t do any tuning.

    For example it’s possible with EC2 to set up your disks in a raid0 configuration and I’ve used this to get much better IO performance.

  • Satheesan

    Google App Engine platform is a true enabler for aspiring entrepreneurs, innovators and would-be SaaS providers. While developing applications, if we follow a few basic architectural compulsions, the applications also never fails to perform. I have been developing on this PaaS , ever since it released in April 2008. The platform has tremenously evolved in last two years. The product road map announced for Google App Engine for Business shows much bigger promise. I am eagerly waiting for it, and hope to see it latest by May 2011.

  • Claude

    @Satheesan I second @Jonas here, the argument about lock down is not really fair, you can either use Java/JPA, which is downright portable or use Python over Django-nonrel.
    AppEngine rules.

  • Claude

    Oops, I mixed up the commenters :/
    I meant: @Jonas I second @Jose here…

  • Alistair Croll

    Thanks for all the feedback. A few remarks:

    1. @John: As we point out in the study, this was preliminary research. We’re omitted several prominent cloud platforms (Microsoft, Joyent, and Gogrid come to mind) for several reasons, and may look at measuring them in the future. Several folks — in particular Cloudharmony — have done a great job of characterizing IaaS cloud performance of several classes of virtual machine within the instance itself.

    2. @Ray: We didn’t do much optimization, and we chose smaller instances of virtual machines. Most IaaS providers have bigger compute instances, but we felt that wouldn’t demonstrate the warts as well (plus, it’s cheaper. ;-)) There’s a lot of tuning that can be done — which is one of the important points: IaaS needs tuning. With PaaS, all you can do is write your code better.

    3. There are ways to multithread data insertion in Google’s Bigtable model, as well as the recently-introduced bulkuploader for faster data insertion. Again, you need to jump through some hoops to use PaaS platforms, but you’re rewarded by some scalability (and limited in your ability to leave by your use of APIs for stuff like image manipulation, as @Jonas points out.)

    As for the “there’s no single best cloud” — what’s not evident from one graph or a performance report is the amount of effort required to port things into a cloud. Enterprise IT is familiar with the notion of a virtual machine, and for most companies, the VM is the “unit of measure” for IT. So clouds that work in those units are more easily understood. Just look at the US resistance to the metric system if you want to see the power of common metrics at work. ;-)

    We also weren’t “optimal.” Had we used Amazon’s SimpleDB (the closest thing they have to Bigtable) or a Cassandra/CouchDB/MongoDB/etc. on Rackspace, we’d have had very different I/O performance results because of the way data is indexed for faster retrieval. In other words, I’d build things differently once I knew what the app was, and get different results.

    I’m a big believer in the representation of performance as histograms, by tier of infrastructure bottleneck, though. Much of the performance analysis we’ve seen out there hides too many truths in the averages.

    In conducting this research, however, it’s clear that popular clouds may face contention (as we see in some of Amazon’s availability zones) and that idle ones will rock — but that may be because they’re idle. For what it’s worth, the results reinforced my belief that PaaS is the way of the future.

  • Edward M Goldberg

    For many projects that I launch the DNS or the Location of the server vs. the user have more impact on the end user page load time. It is very common for the distance from the client to the server to add +200 ms. to the load time of an object that takes the server 20 ms. to service.

    As you evaluate a complex system you need to inspect the whole eco-system.

    when you take one element out of the whole system the focus in on that part you need to be very clear on all of the parameters of the test.

    1) RAID-0 – The fastest disk solution, what that used?

    2) Server selection – AWS for example provides 100+ possible selections of: location/server-size/OS to pick from. What did you use?

    3) Code used for the test – Light httpd is faster for static content then Apache (2) for example.

    Please provide more detail. This is a great topic.

  • Alistair Croll

    @Edward:

    - Some of the data is in the full version of the study. For Amazon we used the smallest server (on purpose, because we wanted to get to breaking point with the smallest amount of load.) We had similar instances at Rackspace and Terremark; the question is irrelevant for PaaS platforms.
    - We didn’t do a lot of server optimization or storage optimization. Because PaaS platforms use a key-value store that optimizes on insertion (in Google’s case) I/O was better; but we could have used a key-value store like SimpleDB on Amazon and had different results.
    - The sample code is available in the full version of the report. We used Apache as a service everywhere on IaaS; we don’t get to control that in PaaS environments.
    - The full report also has some graphs of WAN latency for small objects (which test round trip time) and large ones (which test throughput and packet loss). We didn’t check DNS lookup time in much detail. WAN latency and DNS responsiveness is one of the things that people use Webmetrics for.

    There have been lots of comparisons of Amazon instances (Cloudharmony has some great ones) and several synthetic testing companies have compared Amazon availability zones by geography, so we didn’t want to repeat what is already very valuable and comprehensive research. But we’ve had a lot of interest from people in extending the scope of the study, particularly when it comes to side-by-side histogram comparisons, which make it easy to understand what works best in various situations.

    A.

  • Edward M. Goldberg

    Alistair,

    What I have seen here at myCloudWatcher is that the M1.SMALL is about 1/5 the performance of the MEDUM sized servers. The MEDEUM is 2x the cost. So for the best cost performance I use the MEDEUM for my server arrays.

    At myCloudWatcher we manage lots of servers for many projects. I hear all of the time clients that use these SMALL servers and feel that the performance is lacking.

    For some applications ANY server will provide the needed SaaS. For these low end uses, I launch a SMALL.

    For ANY performance based requirement SMALL is not on the list. I start with MEDEUM and work up as needed.

    AWS provides a low-end solutions to “Round Out” the mix of selection. The SMALL is a low ender, not a performer. But for development servers you can not beat the price point!

    Please re-run this test on a MEDEUM and let’s see the numbers.

    I must say that I like the RackSpace Cloud Servers at the low end better then AWS. If you are looking for $11.00/month AWS does not have a selection for your needs. AWS starts at 2X more cost for a SPOT SMALL ( with lots of issues and limits). RackSpace for 11$ does backups!!!

    At the high end AWS offers more. So I mix both for the myCloudWatcher clients.

    Edward M. Goldberg
    http://myCloudWatcher.com/
    e.m.g.

  • Rich

    Today CloudVDI entered into an agreement with eG Innovations to use and resell eG performance management solutions to drive performance for CloudVDI’s clients’ virtualized environments and to expedite Citrix deployments. http://www.prweb.com/releases/2012/8/prweb9608983