Energy Savings, Strange Attractors, …

the Intrinsic Cost of State Change, Orbiting Alien Voyeurs, and 200 Square Kilometers of Solar Panels Somewhere in Texas

The Silicon Valley Leadership Group and Berkeley National Labs recently published the results of their first Data Center Demonstration Project (pdf). (Disclosure: My colleague Teresa Tung of Accenture R+D labs was the report’s principal author). The study follows up on last year’s publication of the EPA’s report to Congress (pdf) on data center energy consumption. That report, among other things, estimated the range of savings that data center operators could achieve with varying degrees of technology and practice improvement. This more recent report is based on real world studies and was intended to validate the estimates in the EPA report.

Both reports are good reads if you are interested in reducing the megawatts being consumed in your organization’s silicon (though the EPA report has been criticized as being a bit toothless). However, I should warn you that they are fairly long and detailed so the bedside table might not be the best home for them if you want to get through them, at least until the manga versions are released.

The EPA study estimated that “state of the art” technology and processes in the data center might cut energy usage by 55%, the more readily achievable “best practices” come in at 45% savings. State of the art includes a range of approaches including better server utilization through virtualization, better cooling techniques, improved power distribution, sensor networks, etc.

electricity-usage-graph.jpg

The more recent study, testing those techniques in working data centers, validates the EPA’s estimates but also offers the initially surprising conclusion that legacy data centers can be retrofitted to achieve efficiencies close to that of new builds. That conclusion follows from the less surprising finding that the most bang for the buck comes from improvements on the “IT” side of the energy draw (energy efficient servers, virtualization, etc.) rather than from the harder to retrofit “site” side (cooling systems etc.). The dog wags the tail after all and if you can reduce the direct power consumption by the IT equipment, you will simultaneously reduce associated cooling costs whether in an old building with relatively inefficient HVAC or a shiny new one.

The last finding that I’ll mention here is that it doesn’t look like the time is right yet for widespread adoption of more advanced load management techniques outside of niche applications. The demonstration project had facilities that experimented with them, but the risk aversion that stems from high reliability requirements in production data centers has these experiments mostly restricted to centers that serve R+D rather than production functions.

Maybe one of the most interesting things about the report is what it doesn’t (can’t) say.

Because the experiments were run in working data centers where things like cost structure are considered competition sensitive information, the project was not able to collect actual costs. ROI’s are inferred to be in the 18 month range, but that’s mostly guess work based on reasoning like “they must be around 18 months, or the companies wouldn’t have made the investment.”

electricity-usage-graph-question.jpgSo, the graph from the report that lays out possible rates of practice and technology uptake can’t predict which curve will actually be followed. That will depend on where the dynamical system made up of the actual and expected cost of energy, costs and benefits of many possible practice and technology improvement choices, equipment depreciation periods, vagaries of human decision cycles, and other factors, stabilizes. Barring the enactment of aggressive public policy constraints or hard limits on power availability for data center operators, the curve might ultimately follow sort of a strange attractor that leans strange-attractor-small.jpgtoward the state of the art curve. The open question is, how rapidly will it converge? Also, if I’m an operator, which order should I tackle practices and technologies so that I can 80/20 rule my way up the efficiency ladder while paying my way with relatively near term dollars?

electricity-usage-graph-with-attractor.jpg

If you are that data center operator trying to decide which steps to take and in which order, each ROI calculation you tackle is sensitive to the scale of your operation – over how many nodes are you going to distribute that fixed cost?. I think this implies two things. To the degree that the data centers measured in the study might be larger than the average, our strange attractor will get pushed back up, as though the cost of the technology or practices has increased. Because for smaller footprints, the ROI calculations associated with capital intensive practices won’t be as attractive.

Viewed another way, that same conclusion suggests an attraction to more concentrated larger data centers. The economics of increasing energy cost and efficiencies of scale might push smaller data center operators more rapidly into the cloud. Those operators with smaller footprints simply won’t be able to achieve the same low cost per unit of work because of the naturally occurring economies of scale that are inherent in everything from virtualized server pooling to power distribution and cooling systems at scale.

If you don’t mind me going off on a bit of a tangent in an already long post,… given that a data center is really just a vast state machine, it would be really cool if its efficiency was tied to some kind of intrinsic cost of state transition rather than to trillions of leaky circuits. After all, cars burn a lot of gas, but the energy they use is at least in the ball park (an order of magnitude or so) of the intrinsic cost of moving their mass against friction and pushing air out of the way. But for data centers the real intrinsic cost is probably damn near zero, we’re ultimately only processing information after all. So, all those megawatts are tied instead to the massive current leakage associated with the fact that we choose to maintain state in silicon instead of something more elegant (but currently impossible). Viewed as a physical system, data centers are about as efficient as a well cooled warehouse full of burning light bulbs (now there’s an idea, a central lighting plant full of giant fluorescent bulbs connected to your house by fiber optic cable).

In fact, if you were an analog alien floating around in some kind of off-the-grid Galactica you might look down at one of our data centers and see 4MW going in and a mere few hundred watts coming out through an OC-48 fiber trunk and wonder “what the hell?” Watching it spew entropic HVAC waste heat, those bemused aliens could be forgiven for concluding that that these buildings with no obvious use must be massive sacrificial alters where silly humans offer up electricity and make their wishes or say their prayers (well, perhaps we do).

Since it looks like we aren’t going to replace silicon anytime soon and we are on a path of incremental rather than disruptive improvements in data center energy consumption, maybe it wouldn’t be such a big deal if we could just power them with renewable sources. That’s Google’s plan right? At one point in my life I studied mechanical engineering at the University of Texas and I’ve always loved back of the envelope calculations, so if you’ll indulge me just one last rather lengthy paragraph…

The report indicates that U.S. data centers consumed 61 billion Kw-hrs in 2006. That works out to be about 7,000 megawatts of continuous electric load. For perspective, in Austin Texas at noon on a sunny day the incident energy from the Sun is approximately 900 watts per square meter. That sounds pretty good but accounting for the motion of the sun, weather, and the efficiency of our best collectors, the usable incident energy is much lower. From my trusty old Solar-Thermal Energy Systems (Howell, Bannerot, and Vliet) I can look up the “average daily extraterrestrial insolation on a horizontal surface in the northern hemisphere” at 30 latitude and see that it ranges from a low in December of 19.7 to a high of 40.7 MJ/(m^2-day) incident on a horizontal surface. Since it’s not always summer and we still need to power these things in December, we better use the 19.7 value. We’re also going to need to knock it down a bit for weather (i.e. we need to make the extraterrestrial value terrestrial where the collectors will be). I don’t have data for Texas but let’s assume it’s pretty sunny most days and just round that value down to 15. At an average solar cell efficiency of about 20% only about 3 MJ/(m^2-day) are left to do anything useful. Pencil going crazy on envelope… So, to power all of our 2006 data centers is going to require about 200 square kilometers of Texas covered in solar collectors (ignoring transmission and overnight storage losses). We’ll double our computing needs again by 2011 so let’s hope we achieve that state of the art 55% savings, and some more after that, otherwise we’re going to have cover another 200 km^2 of Texas every five years or so.

tags: , , , ,