Jun 21

Artur Bergman

Artur Bergman

Amazon Web Services and the lack of a SLA

I am interested in understanding the business tradeoffs that people make when they decide to host their data on S3 or run their service on EC2 instead of investing in their own infrastructure.

Quoting from the Amazon T&C.

We further reserve the right to discontinue Amazon Web Services, any Services, or any portion or feature thereof for any reason and at any time in our sole discretion. Upon any termination or notice of any discontinuance, you must immediately stop your use of the applicable Service(s), and delete all Amazon Properties in your possession or control (including from your Application and your servers). Sections 3, 5, 8 - 12, any definitions that are necessary to give effect to the foregoing provisions, and any payment obligations will survive any termination of this Agreement and will continue to bind you and us in accordance with their terms.

So, if your company is based on AWS: What does your disaster recovery plan look like? How do you react if Amazon goes down or if Amazon decides to shut down AWS? What happens next December when a busy holiday season makes Amazon divert bandwidth from S3 to their main business,

tags: operations, worries  | comments: 25   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 25

  Swashbuckler [06.21.07 07:59 AM]


With terms like that, anyone who would base their business on Amazon is a fool.

  PabloC [06.21.07 08:02 AM]

It will never happend, because amazon uses the sky to store your data, and clouds to process it. We will always have clouds.

  Greg Linden [06.21.07 08:12 AM]

Good post, Artur.

Amazon says S3 is reliable -- "99.99% availability ... All failures must be tolerated ... without any downtime" -- but you can see if they're willing to stand behind that by looking at the legal guarantees on uptime. There are none. The licensing agreement says the service is provided "as is" and "as available".

I don't think it would be wise to use this for a serious, real-time system. However, S3 still may be good for an asynchronous product like online backups.

SmugMug, for example, uses S3 only to backup their pictures ("We use S3 as redundant secondary storage for use in cases of outages, data loss, or other catastrophe"). This allows them to have fewer servers but still have high confidence that they could recover from a data loss.

SmugMug has made their ETech presentation with lots of information about their experience with S3 at

  Doug Kaye [06.21.07 08:25 AM]

1. As a startup, the economic argument in favor of AWS is overwhelming. It saves us so much money that it wins the risk/reward argument in spite of the terms of service.

2. That's even more the case when you consider the scalability and on-demand benefits (which are not the same as scalability) that we get from AWS.

3. "Just in case" we've implemented AWS via an abstraction layer, so while it wouldn't be entirely painless to migrate, it shouldn't put our business ar risk.

4. Having been part of the AWS beta programs and learning how serious Amazon is about AWS, my confidence is very high. Should Amazon not take this business seriously, they will never be able to enter into such a venture in the future. I've personally spoken to Jeff Bezos about this, and I think he's in this for the long haul.

I have no inside info, but I think you'll see Terms of Service evolve. This is a new area for everyone, Amazon and their customers alike. For those of us pushing the limits of these new services, it feels like a true but clearly cutting-edge collaboration.

  Tim Anderson [06.21.07 08:52 AM]

I've made this exact point to the Amazon folk on a couple of occasions, most recently in an interview with Werner Vogels:

The answer, pretty much, is "watch this space". I got the impression that SLAs will come eventually.


  Doug Kersten [06.21.07 09:24 AM]

I think the terms and conditions will evolve. When the first start-ups running on Amazon get sold to Google, Microsoft, etc. and as businesses grow on Amazon and become a larger part of Amazon's business companies running on Amazon will be able to demand a better SLA.

That being said there is huge value from a cost perspective for a startup to use Amazon. What additional risk am I incurring by the way? If I am running my systems myself then I have to deal with scalability and availability. Doing scalability and availability correctly is difficult and time-consuming and RISKY because if you implement it incorrectly you can kill your business (Zooomr...maybe?). What's to stop the company where I am co-locating my servers from going out of business or pulling the plug? Nothing. Ultimately there is no incremental increase in real risk.

A good SLA may help you get some money back from Amazon for lost business but this occurs after the fact. It's only worthwhile once the damage has been done. Of course having a good SLA is important but it is more important to have a well thought out and tested disaster plan that takes into account the fact that your Amazon services may disappear.

One more thought regarding availability and scalability. I read that iLike had to scramble the first weekend they went live on Facebook to find 100 new servers for their data centers. They had a very stressful weekend but were eventually successful. Now imagine the same scenario on Amazon. My site's load has increased dramatically, I log into Amazon and instantiate 100 new servers. Done. This may be a little simplistic but notice that instead of buying, unpacking and installing servers I am immediately working on getting rid of the problem instead of preparing to get rid of the problem.

I actually like Amazon's SLA because I think it will scare off my competitors who haven't analyzed it properly while I continue to build my business on one of the most cost-effective systems out there.


  eas [06.21.07 10:07 AM]

Disruptive offerings do not generally start off competing for customers of established offerings. They compete with non-consumption.

In the case of Amazon S3, most of the customers right now aren't people who'd have syched EMC storage arrays in two separate datacenters. They are people who would probably be doing without. They'd be two developers without much capital working on their own storage-intensive startup, rather than working for someone elses VC backed startup. They'd be the people who go without offsite backup for their digital photos. They'd be SmugMug, only without as much redundancy.

Greg, go read that slide deck you lined to again. As of March, they were using S3 as primary storage, and their own storage as a sort of cache for frequently accessed images.

"Amazon is our primary storage, and
we use SmugFS as our local hot cache. We end up storing 100% of the data at Amazon, and 10% locally. In the end, we need 95% less disks in our datacenter than we did before."

At one point they were using it for only secondary storage, but decided that was suboptimal. For one thing, they were getting better availability out of S3 than they were out of their in house system.

What's not clear to me is whether they have a full secondary store in their own datacenter, even if it's just a bunch of tapes.

  Swashbuckler [06.21.07 10:47 AM]

There are none. The licensing agreement says the service is provided "as is" and "as available".

Ever read a software license agreement? They say pretty much the same thing.

  Greg Linden [06.21.07 10:52 AM]

Thanks, Erik, for the correction. Good point that SmugMug is now using S3 for more than just backup.

I missed the footnote on that slide you referenced, but you are right that SmugMug now appears to have much of their data only stored at S3.

Frankly, I'm not sure that is a good idea. SmugMug does not have a copy of their customer's data. If Amazon S3 suffers a data loss or extended outage, SmugMug is out of business.

  Simon Wardley [06.21.07 12:03 PM]

Great post!

Apologies to all if I'm stating the obvious here but I'll provide my view on the general point raised.

We all know that modern application development contains a large amount of “Yak shaving” or repeated, common, mundane and expensive tasks including the set-up and configuration of web infrastructure, hosting arrangements and databases.

None of these tasks can be described as providing any form of competitive advantage, as they are common throughout the industry. They are a necessity in order to deliver a web service or site, and are therefore a cost of doing business on the web.

The process of building your own infrastructure is expensive in terms of capital and resource and for any new service there is significant waste due to the difficulty in predicting and planning the capacity.

The obvious question to ask is :-

"Why do companies not simply purchase the common computing resources they use as they require it, in much the same way as companies purchase other commodities such as electricity?".


"Why do we not have the equivalent of electricity providers for computing resources and a computing resource market, where companies as customers can switch between one provider and another?"

Unlike electricity, we are not infra-structurally neutral. We have a relationship with our infrastructure - our data and code resides somewhere. If the infrastructure is simply switched off - we lose this. I need an alternative that I can switch to and my code and data will just run.

So whilst initial utility services have appeared (i.e Amazon EC2), the issue is the lack of alternative providers I can just switch to, there is no choice and there is no marketplace.

All of this creates an implicit exit cost as you need to either build your own or sign up contracts with more traditional ISPs. It also means that without this incurred cost, all your eggs are in one basket - even if it is a very big and reliable basket.

What is needed is a competitive utility computing market, where I can switch from one provider to another provider based upon an open sourced standard for utility computing environments.

Such a market will naturally create competition on price and quality of service through choice. It also solves the problems of disaster recovery by giving choice and should balance supply and demand on infrastructure more effectively than any single company. It also opens the doorway to effective P2P infrastructure ... but that's another post.

The underlying issue is not about Amazon's SLAs but that there is only one EC2 and no Google EC2 or Microsoft EC2 etc.

  Doug Kaye [06.21.07 03:03 PM]

Greg, Amazon has *multiple* copies of SmugMug's customers' data in multiple geographical locations. The key (for us) is that S3 is our entire data store including backup. That's why the economics are so good. As we're looking at eventually hundreds of terrabytes of data, that's a huge savings in both capital and operations costs. Managing a constantly growing and geographically redundant data store is a major undertaking, even for an established company. I've been in that business before, and I'm glad I don't have to do it again.

  Peter Fein [06.21.07 03:24 PM]

How about because Jeff Bezos has repeatedly publically stated that web services is the future of AMZN and Wall Street would smack the stock if they pulled a stunt like that? Having worked at a small company that's been on the enforcing end of an SLA, I'd trust the market more than a piece of paper on this one. Outside of large companies with armies of lawyers, I suspect SLAs largely exist to let middle managers cover their asses with the higher-ups.

  ian [06.21.07 04:12 PM]

so my company started offering web services and i looked into SLAs and couldn't find *anybody* who offers them on the web. i suspect companies tried offering some kind of uptime guarantee in web 1.0 and went bust b/c of the cost of ensuring the .009. so i tell my customers "we're not a bank" and promise best efforts--we've done things intelligently--redundancy, backup, etc..., but i'll be damned if i'm going to pay for the deep nines.

"if it's good enough for amazon, it's good enough for me."

my tune will change over time, but while we are still in early days of this stuff, there's no reason to risk it all.

  catastrophic [06.21.07 05:38 PM]

What is needed is a competitive utility computing market, where I can switch from one provider to another provider based upon an open sourced standard for utility computing environments.

Hear hear.

I'm not holding my breath however. Amazon's web services are now no longer new. Isn't it telling that the natural competitors (google/ms/sun/emc) are letting amazon have this market to itself?

Wouldn't it be ironic if the innovative engine behind new sustainable business models turned out to be itself unsustainable?

  Don MacAskill [06.21.07 07:34 PM]

I'm the CEO & Chief Geek at SmugMug.

@Greg: Your data, I'm afraid, is outta date. :) We now use Amazon S3 as our primary storage, not backup, and do so for well over 200TB of data.

@eas: There's no secondary copy, on disks or tapes, of anything that's not in our "hot cache". Amazon has secondary and tertiary copies for us - that's good enough for me.

@everyone else:

We've seen S3 outages over the last 1.5 years, and I can tell you that big chunks of Amazon go offline at the same time. IS Amazon S3 and EC2. During holiday time, to answer Artur's question, they can't divert resources away from S3 - because runs on S3.

Wanna see for yourself? Start viewing the source on some of their pages. Look at the image URLs on things like DVD covers or whatnot - you'll see they come from EC2 or S3. I'm sure there are plenty of URLs that aren't as obvious too, but there are more than enough that make it very clear where their data is coming from.

I've looked Jeff in the eye and had long conversations about this. He's very serious about the longevity of this approach and I'm sold.

SLAs have never once helped me in my professional career. Our other providers with SLAs blow their SLAs all the time. If I'm lucky, I get a few dollars back - but not nearly as much as the damage they do when they blow it. When stuff breaks, a piece of paper with an SLA on it might save your job - but it hardly helps the network/software/services/whatever actually get back online. It's mostly job insurance, imho.

As far as contingency plans, we don't have one set in stone. We're hoping that soon we can buy similar services from Sun/Microsoft/Google/whomever and use that as a backup. But we also do have the expertise in-house to do this ourselves, and copying all of our data back to our own datacenters is only marginally more expensive than a month of S3 storage, so I think we'd just buy a ton of disks and sync everything back. :)

We use S3 because it saves us time & money, not because we can't do it ourselves. Anyone can build their own S3 fairly easily these days - but I dare you to do it as cheaply or reliably.

  Simon Wardley [06.22.07 04:22 AM]

Hi Catastrophic,

So what you are saying is wouldn't it be ironic if the efficiency gains of balancing supply and demand on infrastructure through consolidation to a number of key common computer resource providers (let's say for example Amazon EC2, Google EC2 and Microsoft EC2 etc) caused an increase in the price of the raw material (storage, bandwidth, computer processing) because less was needed? In other words computing resources are so cheap precisely because so much is needed because so much is wasted? Well, even if the supply side was constrained I'd reckon you'd need about a five fold increase in raw material cost before it didn't make economic sense in today's terms.

As for general pricing, from my analysis the storage prices seem reasonable, but I personally reckon there's a lot more room on the EC2 pricing. Obviously with a competitive market, those are exactly the sort of pressures of price vs QoS which we would begin to see as well as solving the moveable application / DR issue.

So why haven't other companies moved into this field? Well some have, and smaller companies were even in this space before Amazon EC2 started.

I'd expect to see a lot more movement in this field in the future.

  Greg Linden [06.22.07 07:35 AM]

Hi, Don. Great to hear from you!

Good point that you and others are making that other solutions have reliability issues as well. For example, data centers typically take no responsibility for network or power outages, and database vendors take no responsibility for bugs in their software that lead to lost data or outages.

I am still surprised that SmugMug does not seek to keep the original copy of all its data, but I see your point that the modest reliability gained may not be worth the cost for your application.

On the more general topic, are there applications for which you think S3 is not well suited?

For example, I would think you would want to avoid multiple S3 accesses per page due to latency issues and would want most of your site to remain up even if S3 was down.

Do you agree? From the slides, it is not clear that is quite what you do. It appears SmugMug just uses local disk as a cache. You do not, for example, appear to explicitly try to keep most or all thumbnails local, putting many of the full resolution images only on S3. So, thumbnails that have not been accessed in the last few hours or days could be only on S3. Is that correct?

I'm not saying that is not a good strategy. Using S3 as primary storage and having a large local storage cache seems to work well for SmugMug.

But, it may be worth emphasizing that your application is particularly well suited for that technique given that image requests are independent, easily parallelizable, and the cost of an occasional broken image is relatively low. Other applications might have to be more careful structuring their S3 data accesses to maintain good page load times and reliability.

  Doug Kaye [06.22.07 10:28 AM]

Hi, Greg. You seem to imply that S3 is somehow a "second class" content-delivery solution, and I'm not sure why. We (GigaVox Media) do have some non-S3 servers running in a high-quality datacenter, but neither are they more reliable nor do they exhibit lower latencies than our files retrieved from S3. We don't get "the occasional broken image" from S3. Amazon had a DNS problem a while back, so we had a fairly major outage, but it affected a lot of as well, so it got a *lot* of attention.

The only disadvantage of S3 is compared to a true CDN like Limelight Networks, which we also use. S3's datacenters are in the US, and I don't know that they have any geographic optimization of delivery. (They may, I just don't know.) Limelight, on the other hand, has the ability to deliver our files from servers located in other parts of the world.

In our case, we're using S3 for audio and video files, so latency isn't an issue. What does matter is the total throughput, and that is much better from S3 than from our own dedicated servers.

  Ryan Baker [06.23.07 11:26 PM]

I'm not sure what comparative agreements your looking at, but I went through a process where there was a great deal of undue concern about Amazon agreements.

Many SLA's don't guarantee any level of service, they guarantee the provider will charge you less, or give free service, if they screw up. Unless they'll force the provider to pay you, you're not guaranteed any service at all.

Amazon's lack of SLA is mostly due to the lack of any quid-pro-quo agreement from the user. You pay for what you use, not a penny more, and not a minute longer than you want. There's no cancellation fee.

Now I can imagine you might be negotiating contracts and SLA's that say a lot more than the average SLA, in which case you are guaranteed something, up to the providers ability to pay and the ability of the legal system to enforce the agreement in a cost effective manner.

  Brad Dixon [07.02.07 01:20 PM]

Very late to this party... but is there any reason that a 3rd party insurance contract paying your company if Amazon S3/EC2 has a service disruption wouldn't be more valuable? Would it be obtainable?

The "insurance" costs for a standard vendor SLA are clearly built into the cost of goods sold. Paying for SLA insurance to a third party seems like a reasonable notion.


  Oleg Sinitsin [07.02.07 02:04 PM]

There is a rather heated request for information about S3 datacenters and replication here:

I have initiated it because my business serves heavy stuff off S3 (video). Own datacenters or CDN on contract are out of question financially.

However, I'm getting praise from US users and complaints from those abroad. I was just trying to find out if any geographic improvements were on the radar and got no response at all.

IMHO providing information, even not favorable, is much more important that providing SLA or other warranty. I trust myself to estimate risks if there was information.

  TJH [07.02.07 04:25 PM]

With respect to S3, I couldn't locate the (exact) quoted T&C in the 06-22-2007 dated T&C anymore:

Section 3.3.2:

3.3.2. Paid Services. We may suspend your right and license to use any or all Paid Services (and any associated Amazon Properties), or terminate this Agreement in its entirety (and, accordingly, cease providing all Services to you), for any reason or for no reason, at our discretion at any time by providing you sixty (60) days' advance notice in accordance with the notice provisions set forth in Section 15 below.

  Amazon Fool [10.01.07 03:08 PM]

Here's a perfect example of why an SLA is needed with amazon. One of or dev boxes just went down and we'll have to put it back up, luckily we have it backed up, but this type of unreliability is such an annoyance.

  charles kennedy [07.31.08 09:21 AM]

What the F###K is this? What sane publisher or manufacturer would sign away all rights to its intellectual property to sell product on Amazon?

Amazon Services Business Solutions Agreement:

4. License

You grant us a royalty-free, non-exclusive, worldwide, perpetual, irrevocable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all of Your Materials, and to sublicense the foregoing rights to our affiliates and operators of Amazon Associated Properties; provided, however, that we will not alter any of Your Trademarks from the form provided by you (except to re-size trademarks to the extent necessary for presentation, so long as the relative proportions of such trademarks remain the same) and will comply with your removal requests as to specific uses of Your Trademarks; provided further, however, that nothing in this Agreement will prevent or impair our right to use Your Materials without your consent to the extent that such use is allowable without a license from you or your affiliates under applicable law (e.g., fair use under copyright law, referential use under trademark law, or valid license from a third party).”

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.