Tue

Jul 24
2007

Artur Bergman

Artur Bergman

365 Main datacenter power outage - Six Apart Technorati Craigslist

The Web 2.0 datacenter 365 Main, in the heart of SOMA, just lost power. Sites that are affected include Craigstlist, Technorati, Yelp and all Six Apart properties, TypePad, LiveJournal and Vox.

The only information that seems to be available is at the quite innovative Six Apart twitter stream.

I feel sorry for the operation staff that are going to do the cleanup. Having been there far too many times, the pain of bringing up all your machines can be quite heavy.

If you have any additional information please comment.

* UPDATE *

Laughingsquid mentions 6 power outages in SOMA, which seems quite a bit more likely than a drunk employee. I live close to the datacenter and I cannot reach my home network.

San Francisco Chronicle reports SOMA was hit by repeating power outages.

Apparently Colo 4 was the one that blew.

** UPDATE **

Since I am a cynical operation veteran, I need to link to this press release where 365 Main congratulates themselves on 2 years of 100% uptime.

*** UPDATE 3:47 PM ***

Technorati is back, around 2 hours of downtime.

*** Update 3:53 PM ***

Yelp is live again.

*** Update 4:13 PM (by Jesse Robbins) ***
People using LiveJournal OpenID will be unable to authenticate to other sites until service has been restored. Are you affected by this? Please put a note in the comments explaining where and how.

*** Update 4:36 PM ***

It appears like TypePad is back

*** Update 5:06 PM ***

And Craigslist joins the gang of online services.


tags: operations  | comments: 57   | Sphere It
submit:

 
Previous  |  Next

1 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5696

Many popular websites are unavailable at the moment. CNet was temporarily unavailable. Craigslist, LiveJournal and Typepad are all currently unavailable. We’re hearing that a large internet service provider in San Francisco has lost power, and taken do... Read More

Comments: 57

  taz [07.24.07 02:54 PM]

good job, bud!
this is the first post I found through google, saying about the reasons for which craigslist is not working for the moment.

  Cecilia Tan [07.24.07 02:59 PM]

Yes, awesome. A search to news.google.com for the word "livejournal" turns up your post. I had been thinking maybe it was a DNS problem but... no! Thanks!

  Adam Lipkin [07.24.07 03:01 PM]

Thanks for the update! This was the first thing that turned up when I hit google's blogsearch, as well.

  Kingfox [07.24.07 03:02 PM]

Awww jeez. I can't complain in my Vox about LJ being down!

  Inga [07.24.07 03:05 PM]

Thanks for posting this! I was wondering what was wrong with the blogs I read. I also found your post through google news.

  Adrian [07.24.07 03:08 PM]

Thanks for the news. Found this post through yubnub.

  John Tompson [07.24.07 03:11 PM]

Our IT folks at VOX say that it will take at least six hours to be fully restored.

  :Leathers [07.24.07 03:14 PM]

Thankfully I can still post to my GreatestJournal when BOTH Vox and LJ are down...not that its the same buuut.....

  gumby [07.24.07 03:16 PM]


thanks for the info-only link i could find on google news.

what a world we live in.
the internets allow for info outside network news, in time of shut down.

  Tyler Krpata [07.24.07 03:17 PM]

Try searching google news for 365 Main for pure comedy gold. http://tylerkrpata.blogspot.com/2007/07/in-ur-datacenter.html

  anec√≤ico [07.24.07 03:17 PM]

Thanks for the update. I'll have a sit and wait wait wait wait till my website will be, hopefully, back online! ;)

anecòico [CattivaMaestra]

  cathycathy [07.24.07 03:24 PM]

yah, found thru google news too. already had 5 diggs by the time i found it =) thx for the post

  chris [07.24.07 03:29 PM]

So that's why LJ is down. Darn. I want my LJ! hehe.

  buddykat [07.24.07 03:32 PM]

Good thing I was able to get the info via Google News; LJ's status page (that is supposedly hosted somewhere completely separate) is also down!

  Artur Bergman [07.24.07 03:37 PM]

buddykat, that is because even though the status page is somewhere else, the DNS servers are all hosted in one datacenter

  Wisdom of Cowards [07.24.07 03:39 PM]

What the heck is "quite innovative" about the SA twitterstream?

  anec√≤ico [07.24.07 03:40 PM]

my website is finally online again.. but without pictures... gulp!

anecòico [CattivaMaestra]

  Kristen [07.24.07 03:41 PM]

Wow, breaking news!

  Artur Bergman [07.24.07 03:42 PM]

Wisdow of Coward:

Probably not much, I was just surprised!

  Simon [07.24.07 03:43 PM]

Six Apart status page is now working. Reporting all of its sites as down, but I've just got into Typepad

  Bridget Carey [07.24.07 03:43 PM]

Why is there no generator backup for all of these huge websites? Here in Florida we prepare for disasters and every tech company has a backup for their backup when a hurricane strikes, but you're telling me these tech-central, disaster-prone California businesses don't have a power backup?

  John [07.24.07 03:49 PM]

You really should be ashamed of yourself for calling a datacenter "Web 2.0." The whole web 2.0 thing is pathetic, but referring to a datacenter as a web 2.0 datacenter truely is a brilliant show of ignorance.

  anec√≤ico [07.24.07 03:50 PM]

and down again...

  Artur Bergman [07.24.07 03:52 PM]

John: I am sorry, I should remember irony doesn't come through well in written text.

Me and my cynical operation fellows jokingly referer to 365 Main as the Web 2.0 datacenter because so many of the Web 2.0 sites are there.

Last time I saw 5 firetruck pull up after an earthquake, my first comment was "oh well, there Web 2.0 goes poof".

  John [07.24.07 03:55 PM]

Wow. I just got baited. :) Sorry for jumping on that one...

  John [07.24.07 03:57 PM]

Wow. I just got baited. :) Sorry for jumping on that one...hope you don't mind if I use that phrase in the future. ;)

  gumby [07.24.07 03:59 PM]

bridget from florida needs to read about how the internets and pipes and chips work, instead of jumping on the RNC echo chamber ride.

  Andrew [07.24.07 03:59 PM]

The irony -- from 365 Main's home page, "REDENVELOPE REPORTS TWO YEARS OF CONTINUOUS UPTIME
AT 365 MAIN’S SAN FRANCISCO DATA CENTER" was posted earlier today... looks like that two years is suddenly over!

  Tony [07.24.07 04:02 PM]

I am so disappointed, what kind of a datacenter is that. No backup power.

  Artur Bergman [07.24.07 04:03 PM]

365 Main claims to have backup power. The question is why it didn't work!

  adam [07.24.07 04:12 PM]

I'm in ur dataz center smash'n yo web!

  zazoo [07.24.07 04:20 PM]

this sucks.. where do i troll for whores now that cragislist is down???

  Pedro [07.24.07 04:26 PM]

Colo 7 at 365 Main went down as well. Our systems in Colo5 stayed up though.

  Roger [07.24.07 04:33 PM]

Can't get my LiveJournal, can't get my SecondLife, maybe a good time to start a DeadJournal.

  Carsten [07.24.07 04:38 PM]

I got a chuckle out of reading their own lines, quote:"Each facility is optimized for modern data center requirements, featuring 24/7/365 power, cooling, connectivity and security capabilities to ensure mission-critical operations and business continuity for tenants."

  jmbranden [07.24.07 05:54 PM]

Apparently they have no "realistic" drills or tests implemented to ensure their backup systems work properly. With as many gen sets as they claim to have (I'm guessing they have synchronous generators) I find it really funny not one actually fired up and did its job. I'd be *REALLY* curious as to what actually happened at 365 Main and who's head is gonna roll over it. There should be absolutely no excuse for losing power like that in a datacenter, short of building damage from a storm or other act of nature. Especially in a "redundant" power system. The odds are astronomical. Someone messed up. --Good reporting by the way! Keep up the good work!

  ajblardone [07.24.07 06:22 PM]

I was there when the power went out. The generators kicked in right away. Some colos were fine others weren't. Mine went black for a while after the outage. 365 main had been working on electrical upgrades all week and this outage might have been bad timing for them... At 4pm 365 main sent out a notice saying the building was 100% operational and still running on the generators until PG&E confirms that utility power is stable.

  ryan [07.24.07 06:25 PM]

I find it interesting that sixapart brings up typepad before livejournal. They don't seem to have enough staff to do both at once.

Perhaps lj was better as an independent service?

  Randall [07.24.07 06:59 PM]

The fact the backup generators didn't come on is not surprising. I had two separate instances of data center power-outages that resulted in several hours of downtime for all my sites hosted through NTT Communications (which they a redundant backup power system but that failed with no explanation). So it's not unheard of even for world-class hosting providers.

  Adam Lipkin [07.24.07 07:11 PM]

Ryan, I'm guessing that it's because 6A views Typepad as more "important" because all customers there are paying customers, while only some percentage of LJ customers pay.

In light of that, I'm not overly inclined to be one of those paying LJ customers much longer.

  Anil [07.24.07 07:19 PM]

Ryan and Adam, TypePad's blogs came back online earlier because they're static web pages, but both the TypePad and LJ teams are working diligently to get both applications back up to 100%. They're working in parallel, along with the Vox team, and we've got an enormous number of customers (paying and non-paying) on LiveJournal, so it'd be silly not to prioritize them equally high. Feel free to get in touch with me, though, if I can help with more info or explanations.

  Adam Lipkin [07.24.07 07:27 PM]

Anil, thanks for the update; you do realize, however, that what you've written here is incredibly more detailed than what's up on status.sixapart.com, right? It's that sort of lack of official communication (along with 6A's constant blunders over the last few months with account bans/removals and the ongoing downtimes and database issues), which don't exactly inspire much in the way of trust (especially since it now seems that every other site affected by the outage (which, I realize, is the fault of 365, not 6A) has managed to come back up.

  Foxesdaughter [07.24.07 08:08 PM]

My first thought?

Some rabid fanboy is REALLY pissed he can't get hermione pr0n on LJ anymore ...

  Rebecca [07.24.07 08:19 PM]

Yeah, Six Apart's communication with LJ users when there are issues has been nill as of late. I too find it suspect that LJ is the LAST SA blogging service to be brought back online. It's also the biggest, which might be a factor, but STILL. Cater to your audience, not to the fringe users of your other services.

  Dana [07.24.07 08:36 PM]

If 6A is so concerned about LJ users, where are the specific updates on its downtime on the 6A site, beyond that paragraph at status.livejournal.com? TypePad users have gotten regular, specific updates. It almost feels like now that TypePad is up, everyone's decided to go to bed.

  Claudia [07.24.07 09:05 PM]

I'm so glad I never bought a permanent account. I miss old school LJ.

  Cecilia [07.24.07 09:32 PM]

Even Vox is apparently close to going back up. It looks like LJ users are going to be the last ones to get service. This is just another instance that makes me wonder if I should bother with a paid account.

  Mark Eichin [07.24.07 10:01 PM]

re openid: I would have been impacted by it, but I only use it for doxory.com which is of even lower importance than livejournal :-) and I'd already seen Jesse V. complain that it was out (it's working fine as of this posting.)

  hillary hartley [07.24.07 11:42 PM]

what's hilarious is that the press release link is suddenly 404. i've seen it linked in almost every blog and article i've read about the power outage. i guess they decided to simply delete it!

  Steven James [07.25.07 02:48 AM]

No one finds it strange that Sixapart breaks the extremely common rule of at a minimum, 2 DNS servers, 3 is best, and if you are going to have only 2, they really should be in separate geographic locations... Why are both of their DNS servers for all of their services on the same subnet even! Hahah someone get them a good DNS book, or show them how to use Google to look up DNS configuration ;)

  DataGuy35 [07.25.07 09:28 AM]

Gotta have multiple sites. This is mission critical stuff. Pick a secondary spot that won't fall into the ocean. I've heard great things about the guys at i/o Data Centers in Phoenix about their sites as backup or primary data center locations. Check this out: http://www.bizjournals.com/phoenix/stories/2007/07/02/story15.html?from_rss=1

  Matthew Leeds [07.25.07 02:14 PM]

Got to have multiple colos, with load balancing between them, and make sure you can lose one and still handle the load with what remains. There are many events (backup power loss just being one) that can take out an given colo or data center. We currently are in three colos, including one out of state. We load balance in real-time across all three. We occasionally take one down for major upgrades; customers never notice. Yeah, it's not cheap, but it's a cost of doing business in a 24x7 market.

  COLO-CHOLO [07.25.07 03:50 PM]

I was there for about 6 hours briging up all my systems. What I joke that place is and all they had to offer was pizza. Who the hell is going to go get pizza which your systems are down.

  Michael T. Halligan [07.25.07 05:49 PM]

Seriously, are people surprised? The Mission St. Substation has had 3 serious outages in the past 7 years. California itself had 2 straight summers of rolling blackouts, which only subsided thanks to the dot-com crash. California is running out of duct-tape.

As for 365main, they usually run a good operation, but seeing as how they're the most expensive colocation facility in California (at least from my experience), you expect more. We moved our infrastructure out of 365, and off of California's fragile power grid, and into a much better, greener, and cheaper Seattle grid 4 months ago. This was the best decision we ever made.

In April, 2005 365main had an outage that affected all customers for 50 minutes due to a failed EPO valve. 365 handled that outage spectacularly, claling all of their customers within 15 minutes of the outage.

In February, 2006 365main experienced a partial outage for 3 seconds that only affected some customers, but caused problems in their Telco spine, affecting connectivity.

In October, 2006 365main had a backup generator fail, but supposedly no customers were directly affected, but customers were not allowed to enter the building between 3:29 PM and 4:40 PM.

  rick gregory [07.25.07 05:54 PM]

Amatuers... Come on people, your business goes away if one colo fails? What if an earthquake took out a colo for days or weeks? Get a backup colo that's replicated to and put in on the east coast.

  leah [08.03.07 02:53 PM]

There’s a funny cartoon about the power outage at ITGumbo. I don’t know what it has to do with Harry Potter, but maybe you’ll figure it out.
http://www.itgumbo.com/mumbogumbo/2007/07/the_power_outage_at_365_main_s.php

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU