From an operations perspective these kinds of outages are nothing new, and underscore why having “many eggs in few baskets” is such a problem. I believe we will see similar incidents when we have the first multi-datacenter failures where multiple providers lose significant parts of their infrastructure in a single geographic area. (Remember: location is a basket too!)
To really understand the current issue, I recommend Neal Stephenson’s incredible (and lengthy) Wired article from 1996 entitled “Mother Earth Mother Board“:
[…] It sometimes seems as though every force of nature, every flaw in the human character, and every biological organism on the planet is engaged in a competition to see which can sever the most cables. The Museum of Submarine Telegraphy in Porthcurno, England, has a display of wrecked cables bracketed to a slab of wood. Each is labeled with its cause of failure, some of which sound dramatic, some cryptic, some both: trawler maul, spewed core, intermittent disconnection, strained core, teredo worms, crab’s nest, perished core, fish bite, even “spliced by Italians.” The teredo worm is like a science fiction creature, a bivalve with a rasp-edged shell that it uses like a buzz saw to cut through wood – or through submarine cables. Cable companies learned the hard way, early on, that it likes to eat gutta-percha, and subsequent cables received a helical wrapping of copper tape to stop it.
[…] There is also the obvious threat of sabotage by a hostile government, but, surprisingly, this almost never happens. When cypherpunk Doug Barnes was researching his Caribbean project, he spent some time looking into this, because it was exactly the kind of threat he was worried about in the case of a data haven. Somewhat to his own surprise and relief, he concluded that it simply wasn’t going to happen. “Cutting a submarine cable,” Barnes says, “is like starting a nuclear war. It’s easy to do, the results are devastating, and as soon as one country does it, all of the others will retaliate.”
As the capacity of optical fibers climbs, so does the economic damage caused when the cable is severed. FLAG makes its money by selling capacity to long-distance carriers, who turn around and resell it to end users at rates that are increasingly determined by what the market will bear. If FLAG gets chopped, no calls get through. The carriers’ phone calls get routed to FLAG’s competitors (other cables or satellites), and FLAG loses the revenue represented by those calls until the cable is repaired. The amount of revenue it loses is a function of how many calls the cable is physically capable of carrying, how close to capacity the cable is running, and what prices the market will bear for calls on the broken cable segment. In other words, a break between Dubai and Bombay might cost FLAG more in revenue loss than a break between Korea and Japan if calls between Dubai and Bombay cost more.
The rule of thumb for calculating revenue loss works like this: for every penny per minute that the long distance market will bear on a particular route, the loss of revenue, should FLAG be severed on that route, is about $3,000 a minute. So if calls on that route are a dime a minute, the damage is $30,000 a minute, and if calls are a dollar a minute, the damage is almost a third of a million dollars for every minute the cable is down. Upcoming advances in fiber bandwidth may push this figure, for some cables, past the million-dollar-a-minute mark. [Link]
Update Feb-06 @ 08:52 GMT: I am aware of five cable segments that are experiencing problems, including one that was reported on January 23rd which had a repair already underway. I don’t think this is a “fifth cut” as some people are starting to report, and I’ll post an update if that changes.
A lot of needless confusion and worry could be avoided if FLAG Telecom and the other carriers involved would provide timely and useful updates on their website. It appears that they are doing a good job of restoring connectivity, but they are terrible job of telling an increasingly concerned public exactly what is going on. This kind of confusion resulted in false reports that “Iran was completely offline“, which was corrected by the Renesys blog team after the story spread to influential blogs, Slashdot, Digg, and the mainstream media.