Amazon's new EC2 SLA

Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services.

Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes of downtime per year. It’s important to remember that an SLA is just a contract that provides a commitment to a certain level of performance and some form of compensation when a provider fails to meet it.

Here’s the summary of the EC2 SLA (emphasis added):

Service Commitment
AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit as described below. […]

  • “Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.” If you have been using Amazon EC2 for less than 365 days, your Service Year is still the preceding 365 days but any days prior to your use of the service will be deemed to have had 100% Region Availability […]
  • “Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances. […]

To receive a Service Credit, you must submit a request by sending an e-mail message to aws-sla-request @ To be eligible, the credit request must […] include your server request logs that document the errors and corroborate your claimed outage (any confidential or sensitive information in these logs should be removed or replaced with asterisks)

This new SLA does not appear to address the reliability of server instances individually or in aggregate. For example, if half of a customer’s EC2 instances lose their connections or die every 6 minutes, EC2 would still be considered “available” even if it is essentially unusable.

If the entire EC2 service is down a cumulative four hours and twenty minutes, customers must furnish proof of the outage to Amazon to be eligible for the 10% credit. This seems like an onerous process for very little compensation, and isn’t in-line with Amazon’s famous “Relentless Customer Obsession”. Amazon takes monitoring very seriously and should take the lead by tracking, reporting, and proactively compensating customers when it lets them down.

tags: , , , , ,
  • As you have shown by example here, an SLA protects the service provider, not the customer. But what should you expect, considering the low cost of the services? An insurance policy that would fully compensate you for all the nasty stuff that can go wrong in a data center should, by itself, cost a lot more than the price of EC2 services.

    One year ago I wrote a similar critique of Amazon’s then-new SLA for its S3 service.

    I feel a twinge of regret for singling out Amazon because they are one of the responsible, aboveboard vendors. But let’s face it: salesmanship is the name of the Internet SLA game. After all, “A sucker is born every minute.”

  • On Twitter, @Werner (Werner Vogels, Amazon CTO) responded to my tweet of this post with two comments:

    1. not sure the analogy works. We do not track packages to your doorstep and send new automatically, you will still have to tell us

    2. we did proactively credit customers after the S3 outage

    Both of these make sense to me. So perhaps you’re being a bit harsh here, Jesse.

  • @Tim and @Werner,

    I didn’t mean for this to come off as harsh. I know from personal experience that Amazon, Werner, and the AWS team are committed to building reliable services that people can depend on.

    The problem with Werner’s “lost in shipment” analogy is that it refers to problems that happen after the package has left the warehouse and is out of Amazon’s control. Contrast this with EC2 outages covered by the SLA which are entirely within Amazon’s control.

    Amazon did the right thing when it proactively credited customers after the big S3 outage, but this was a special case. My point is that doing the right thing for customers should always be the default. Amazon has a unique opportunity to raise the bar for our emerging industry… they should seize it!

  • AWS appears to be doing ‘release early, release often’. They have consistently improved their service and price. I expect your unnecessarily harsh criticisms will be heard and addressed with future releases.

    -aws fanboy

  • Dennis Linnell

    I think providing automatic refunds to customers will not “raise the bar” for the industry. Such compensation is peanuts relative to the total cost of an outage. It adds administrative costs and overhead. Who pays for that?

    As a consumer of such services, I don’t want to pay for insurance. When a vendor’s service is bad, I vote with my feet. I’d rather see the vendor invest in delivering higher reliability and reducing operating cost than in automatically paying a minuscule refund.

  • While you point some interesting weaknesses in the AWS service level agreements, without looking at competitors, this doesn’t mean much.

    How does the AWS service level agreement compare to other cloud services? Given the early state of these services Amazon just has to be better than the competition – even if the SLA isn’t perfect on day 1.

  • @Abnerg

    You’re right. Unfortunately, the problem here is actually an industry-wide defect. I’ll do a followup post with two things:

    1) A “best practices” SLA
    2) Comparative examples of existing SLAs

    Thanks for the suggestion.