Jun 6

Tim O'Reilly

Tim O'Reilly

Google's Folding@Home on the "Multi-Core Crisis"

There's been a fascinating exchange about programming for multi-core computers on Dave Farber's Interesting People List. It started when Andrew Donoho wrote in a thread about the need for US colleges to retool their programming classes:

It is very clear that the software industry is going to hit a programming wall some time in the next 6 years (4 Moore's Law generations). Microprocessors are going to progress from 4 to 64 processor cores. Most algorithms, other than the embarrassingly parallel ones, will hit this wall. This wall is not going to be surmounted by 'code monkeys'. To truly exploit this coming surfeit of processors, we will need both architectural and software invention. This problem will need 'classically trained' disciplined thinkers. While I was not trained as a computer scientist (experimental physics was my field), the problem is going to need folks of the caliber that originally defined our core architectures. (The next Johnny Von Neumann, I'm looking for you.) In other words, we are entering an era of unprecedented architectural research and invention opportunity.

Adam Beberg replied:

Sorry Dave, I just can't let the multi-core "crisis" comment stand without a reply. That false meme needs to die.

I work on Folding@home, which is a 250K-node distributed system, where each node is now days a dual-core at minimum, an 8-core Cell on avg, or at maximum an on chip cluster of SMP elements with 1000's of threads in a NUMA setup of vector processors in the new GPUs. Nominally we have a 1M+core system operating as a coordinated whole.

The idea that 64 cores is a problem is patently absurd. Complex on many levels? Sure. CS not available by the 80's? No - the 64-node Caltech Cosmic Cube was in 1981. Wall in sight for multi-core? Nope maybe at 1B cores, I'll keep you posted. GPU compilers ready for mainstream? Give them another year. Courses to learn this stuff? Already out there.

Andrew then replied:

With all due respect to your excellent work on Folding@Home, your algorithms fall into the "embarrassingly parallel" category.

Adam didn't agree at all:

While on the surface Folding@home appears in the job queue category of SETI/BOINC projects, it's not. Under the hood it's a rather crazy feedback process with Markov models and each new work unit depending on the previous ones. If anything this is where the fun CS problems are, and where the active research is. 10 years ago noone even thought you could distribute these problems and had to use supercomputers.

The argument continued into all kinds of fabulous detail about the kinds of problems that are or are not amenable to parallel computing. In summing up, Andrew wrote:

the industrial world, which I am from, is hitting the wall of many core programming. Many of our economically important algorithms (value > $1 Billion) do not scale above 16-20 cores.

Adam replied:

Yes we will solve them, but we have to change our algorithms from what most people are used to and this will take time. The same methods we use for distributed folding also seem to translate to a wide variety of other domains, so I see no hard walls on the horizon. I really do hope we find a wall soon so I can climb it and I'm crossing my fingers for a surprise at a billion.

The current generation of programmers is learning in a world of multi-core, and from what I have seen they have zero if any trouble dealing with it. Once they get some experience, we'll wonder why this was ever considered hard.

tags: nitty gritty tech  | comments: 18   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 18

  Bob Warfield [06.06.07 08:46 AM]

I tend to agree more with Donoho that the problem is currently hard and that most software and tools do not take good advantage of massively parallel architectures. This is particularly true for Enterprise software. I've written a brief note about Enterprise software and grid computing on my blog:

  geoweb [06.06.07 09:17 AM]

OK, my comment is intended to be a pot boiler so lots of smart people will educate me. So about ten years ago, I sold a 20 processor database server called a SparcServer 20, mostly running Oracle. I never saw a customer get any benefits past twelve processor in the real world. It was diminishing returns past that. If we are talking real world, we have to be talking about boxes at one location right for the most part? Folding/SETI are pretty unique computer scientist applications. I know TPC benchmarks so extra return past twelve processors, but they weren't very real world back then. When Intel ships quad core desktops and 8 core, won't this still be a big programming challenge? Is there a compiler that will "8 way" your code after a recompile?

  Bob Warfield [06.06.07 10:20 AM]

Geoweb, you're pointing to exactly the problem that Andrew Donoho brings up. Most software today isn't designed to take advantage of very many cpu's. Now that cpu's are becoming multi-core, it only makes that problem worse. If every cpu has 2 cores, your 12 cpu example shrinks to 6. So you are right, this is a big programming challenge.

It can be overcome, and as I mention in my blog article (link in the first comment post), it is something Enterprise ISV's need to be thinking about.

It's worth noting that the advantages are even more acute for multi-tenant SaaS vendors. Multi-tenancy greatly increases the volumes a single instance has to process. SaaS greatly increases the desirability of scaling at low cost due to the economics involved.

The challenge has not been cracked in any very generic way--it takes some very clever programming to take advantage of these machine architectures. I absolutely agree that more tools are needed!



  Aaron [06.06.07 11:22 AM]

I disagree with Andrew. In a few years time the tools will exist to empower any code monkey to deal with x number of cores

  adamsj [06.06.07 12:04 PM]

Probably so, Aaron--but who's going to build them?

  Stefan Arentz [06.06.07 01:43 PM]

Adam, the CPU vendors are going to build these tools. Think about operating systems, compilers, frameworks.

Well not operating systems maybe. Only Sun seems to be in a unique position with Solaris 10 combined with their crazy multi-core CPUs.

Btw, it is a myth that a smart compiler will suddenly make your code X times faster. Where X is the number of cores. Sure, compilers can do some cool optimizations accross core boundaries now (for example the just released Intel C/C++/Fortran compiler v10 that can do multi-core optimizations) but to REALLY use all those cores the application programmer will need to come up with a proper design that scales on multiple cores.

A good design is not hard in many cases. It just requires a different way of thinking. SMP used to be special, now everybody has a dual core laptop.

Just think how much the next generation will have when their FIRST computer will be 16 core :-)

  Bob Warfield [06.06.07 03:14 PM]

Stefan is exacly right--it takes a different kind of thinking, and you can't just automagically have the compiler deal with it for you.

In fact, one of the biggest problems with dev tools today is they're entirely too procedural. The more procedural they are, the harder it is to take advantage of many cpu's in a scalable way. Languages like Java and C++ are painfully procedural. Ironically, one of the oldest languages available, SQL, can be fairly declarative, and hence one sees products like Oracle for the grid because the SQL sees no difference. The language is designed to work on sets of data, not single elements at a time.

Programming these new hardware architectures is going to involve working with the old languages "by hand" until a new language takes off. The window for the new language will be opening wider and wider the more pressure there is to take advantage of many cores.

It's about time--we've had essentially the same languages for a long long time. They're all object oriented retreads of C with slight upgrades here and there.

  random8r [06.06.07 06:03 PM]

meh. erlang it.

  dbt [06.06.07 10:10 PM]

Procedural is easier to parallelize than object oriented with shared state.

  Scot [06.07.07 08:39 AM]

Maybe I'm being a bit dense, but what applications are out there that could use "large value of X"-way processors that aren't already "embarrassingly parallel"? My email client is probably not designed to make use of 64 cores, but then it does fine on a single core. Maybe my browser could make use of it, with each tab running on a separate core, but is that really going to require a PhD in Physics to implement?

  Tim O'Reilly [06.07.07 09:47 AM]

Also appropriate to this discussion, even the number of "embarrassingly parallel" applications is going to explode beyond the obvious. Consider this one. Collective intelligence of all kinds is massively parallel, and harnessing collective intelligence is the essence of Web 2.0. In this TED video, Blaise Aguera y Arcas shows how Photosynth stitches together models of the world from millions of separate Flickr images. What other applications can we rethink and redo as we get beyond today's technology...?

  hopia [06.07.07 10:55 AM]

Well, I've been an embedded programmer for 12 years and I've been working with designs that are multi-threaded and that simultaneously execute code running from different real-time hardware components. When I first started--fresh from college, then--it took about a few months to get used to all of the concurrent stuff that was going on in my systems. I think it's just a matter of thinking about your problems differently and coming up with the most effective solution. I don't think it's a systemic problem with current programmers in general. Just like any programming skill, it can be learned and the learning curve is not as steep as some of the industry people make it sound.

Admittedly, the old embedded multi-threading problems I worked with are slightly different from the current issues brought upon by true multi-core chips. And I think the biggest difference is scalability. In my projects, old embedded multi-threading designs usually have a fixed number of specialized "cores" in the system. For example, you have one or two for the image decompressor, one for the general processing, one for the USB controller, etc. However in multi-core environments, the best designs should be able to accommodate a variable number of processors. Writing code for a fixed core environment is different from writing code for an environment you know will use a variable number processors. And I think just being aware of this fact before planning your design is the key difference. But it's just a matter of putting your designs in a scalable multi-core framework. It's just like working on any other software design problem. Sure, the architecture will be different, and definitely I won't assign the junior programmers in my team to work on the designs by themselves, but experienced embedded programmers won't find it horrendously difficult.

  geoweb [06.07.07 11:00 PM]

The Folding@home application heavily uses hidden markov models, basically heavily simulating complex probabilities to determine hidden values - ie original signals. Speech recognition, handwriting recognition, gesture recognition (Minority Report style computing), synthesizing multiple rss feeds all could make use of HMMs. One can envision dedicating a processor to speech, one to gestures, even one to skin temperature. If your computer senses you are tired, it keeps the answers short. If there is stress in your voice, it gives answers in a calming voice. Remember Hal in 2001? You just have to imagine processors and memory are really cheap. Still need those tools though.

  Jeff Keasler [08.01.07 04:29 PM]

There is alot of software out there that is already bandwidth limited on one core. This means that the data bus is not able to shuttle the information from memory into the processor at a rate fast enough to work on it. As more cores have to share the same data bus, that situation only gets worse. People have already done an extrordinary amount of work to reuse data in cache, so the argument that recoding software for the new era will get around the bandwidth problem is bogus for that category of software.

  Nes Liber [09.06.07 07:26 AM]

You can use 4 core CPUs today: give the operating system, virus scanner, email program and web browser each one core, Voila!
Your current software will continue to work. It will not get faster automatically by buying the latest hardware but it can be optimized over time if enough interest exists. But frankly the bottleneck today is the hard drive and the network not the CPU. The driving force for faster CPUs at the enterprise are servers (file, web, print, app, database) and at the consumer level computer games. I am confident that developers for those types of applications will adapt.

  Imanpreet [09.06.07 02:58 PM]


Sure, you could run OS, Virus Scanner, Email program and web browser on each of the cores, but that is only true for the multicore architectures akin to the one that are there in market like Intel Dual core or Quad core.

By Multicore processors which are coming up now, like Cell, Tilera, Stream it is not in the architecture to run multiple Operating systems on each of the cores, rather like in case of Cell there would be a central core, PPE, which would be the only one that is capable of running an Operating System.

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.