Special Purpose Computing Focuses on Energy Efficiency

To improve the climate models that predict global warming, climatologists are seeking model resolutions on the order of 1 km. Unfortunately, building the required 200 petaflop machine with today’s commodity-hardware approach would cost $1B and would result in a staggering 40 megawatts of power consumption.

A group of researchers at Lawrence Berkeley National Labs, who must be aware of the irony inherent in using 40 continuous megawatts to better predict global warming, may be returning supercomputing to its specialized roots but along a new vector (yes, weak pun intended). In addition to the Cray-era focus on raw power, they are emphasizing energy efficient computation where floating point operations per watt is the key metric.

Their approach has already been described at EcoTech Daily and the lab’s Research News so I’m just going to summarize it here. They are working on specialized hardware consisting of 20 million very low power embedded processors (of the sort used in iPods and cell phones) wired together with the specific climate calculations in mind. By trading flexibility for efficiency, the design should achieve a ten-fold improvement in the floating point operations per watt metric and the resulting 200 petaflop machine is predicted to require only 4MW of power and cost $75M to build.

Their motivations in their own words:

“What we have demonstrated is that in the exascale computing regime, it makes more sense to target machine design for specific applications,” Wehner said. “It will be impractical from a cost and power perspective to build general-purpose machines like today’s supercomputers.”

Specialized problems are amenable to specialized solutions and scientific computation seems particularly suitable to this kind of approach. However, on the web and in corporate IT where computing is both more general and inefficiently deployed, the first wave of energy efficiency improvements are being addressed primarily through a combination of virtualization and incremental improvements in commodity chip design.

I don’t think software carpooling will be the only game in town for long though. While virtualization and dynamic provisioning are facilitating better utilization of existing hardware, virtualization comes with a performance cost of its own and can be no better than the hardware it is running on. Once you get four passengers in a V-8 powered SUV further improvements have to come from changing driving habits and modifying the vehicle.

As virtualization initiatives pick the low hanging fruit, further gains will come from fundamental hardware improvements (which may include analogous specialization) in concert with “best efficiency” dispatching that targets optimal server utilization in a dynamic server pool. An interesting example of this kind of approach is described here (pdf).

As I’ve touched on before, beyond that, a “systems view” to optimize the whole data center as it operates under changing conditions and with heterogeneous hardware might come next. Returning one last time to the carpooling analogy, this would be like a smart traffic routing system that keeps each car-pooling hybrid moving at its most efficient speed. The end result might be an optimally-sized mixed pool of specialized and commodity hardware each dispatched to operate the data center holistically at its best unit of work per watt.