The Faint Signals of Concurrency

We’ve been looking at the world of concurrent programming lately. You might have seen Tim’s posts on Erlang and Haskell, or my post on alternative systems to threading. Here, in a nutshell, is why we’re interested in this stuff and what we see.

  1. CPUs aren’t really getting faster. The CPUs in your brand new machine are the same speed they were two years ago. Moore’s Law has expired. Now all the manufacturers can do is stuff more CPUs into your machine. Sure, someday there’ll be a breakthrough in the speed of light or subnano manufacturing or we’ll finally get a 64-qubit Intel Quarkium, but until then we’re stuck with more slow CPUs in each new box. The only way to get faster execution is to parallelize your code.
  2. Google’s been doing it. An 8-CPU machine is analogous to an 8-machine cluster. Google have been building clusters with hundreds of thousands of machines because their industry has an embarrassment of parallel problems to solve. We’re big believers in watching the alpha geeks, and it’s harder to get more alpha or more geekier than Google. For this reason we’re watching MapReduce and Hadoop. Other early leaders in this area are the big data sciences (astronomy, genomics/proteomics, nuclear physics) each with different problems (image processing, sequence alignment, 3d alignment, detailed simulation).
  3. Barrier to entry is coming down. Your Playstation 3 can kick ass at protein folding, and for $13 you can buy the Parallax Propeller, a chip with 8 cores in it to slap into a board and play. And, of course, off-the-shelf commodity PC hardware has never been cheaper. Hell, your video card GPU has a ton of processing elements, although everyone has their own interface to the goodies.
  4. It’s a hard problem. If it were easy then it wouldn’t be a problem. Computer science has been busy for years developing techniques for clusters and for multicore machines, figuring out how to distribute computation across multiple processors.
  5. The alpha geeks have their hands in it. Before you needed a high-budget lab to work on these theoretical problems. Now you can buy your lab at Wal-Mart, the problems being solved are your problems, and the programming environment is familiar (Perl, Ruby, C) or worth learning (Erlang, Haskell’s STM). When Audrey Tang went to develop a Perl 6 interpreter, she did it in Haskell. The Pragmatic Programmers are releasing an Erlang book and we’re doing one on Haskell.
  6. Did I mention it’s hard? The mainstream systems for concurrency are baby talk, coding with crayons and fingerpaint. Threading is a nightmare for those who use it, and if 10% of programmers can really code well without bugs then only 1% of those can do so in a threaded environment. This pain (and the alternatives to threading that people are investigating) is on our radar.

My friend Bryan O’Sullivan is one of the alpha-geeks we’re watching (he’s co-author on our Haskell book). I pinged him about concurrency and he had this to say:

What’s going to really fuck people up, out in the real world, is the programming model. Programming is hard; parallel programming is way the hell harder; compsci courses have turned into votech Java pap; and enrollments in compsci are in any case as lively as the waiting list for the Lusitania the week after it was torpedoed. People want their programming to be easier and more casual, and they’re about to have it jammed into their eyesockets on bamboo stakes instead.

I wrote this post in March but for some reason failed to post it. In light of recent events it still seems timely. I’d love to know what you think. Is concurrency on your radar? Are you perfectly happy writing threaded code? Are you perfectly happy writing code without any concurrency whatsoever? Tell Unca Nat all …


Get the O’Reilly Programming Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

  • Tim

    Tons of links and words on crunchy concurrent goodness here:

  • 20 years ago, my [then] company used to distinguish between “systems programmers” and “application programmers” according to whether or not they could successfully design and code concurrent apps. If you could think in terms of signals, semaphores, shared memory, etc., you were in the former category. You’re right, it’s more difficult. (It’s also a lot more interesting.)

    But there’s another class of class of applications to consider: those that are concurrent but asynchronous. These apps use messaging to communicate rather than more tightly coupled inter-task mechanisms. Asynchronous architectures can take advantage of multi-core processors every bit as much as those that are synchronous, and if an application can be implemented asynchronously, it is often simpler and can therefore be developed by programmers with fewer skills. They’re also sometimes simpler to test and debug than tightly coupled ayncronous applications. (We’ve all experienced the challenge of trying to catch a time-dependent flaw.)

    Examples: Consider CPU-intensive applications like encoding large media files. Depending on the format, may be able to split an object into multiple smaller chunks, encode them, then concatenate the results. Each encoder instance is a service that accepts messages to start a “job” and sends a completion message when done.

  • “Moore’s Law has expired.”

    I don’t believe this is true (yet). Moore’s law had to do with the complexity of chips, not the speed. The most common interpretation is that transistor counts double every 18 months.

    With multicore CPUs, the transistor counts are still going up, and more broadly, the complexity is certainly going up.

    Silly semantics, since your point is really that CPU speed, as measured in GHz, isn’t increasing as rapidly – which is true. :)

  • monopole

    While parallelism is coming, many of the points in this post are way off:
    1. Moore’s law hasn’t expired, just clock speed. Transistors are still doubling per unit area, it’s just that nobody has worked out a way to (practically) dissipate the heat generated (There are ways, gotta dust off my Dissertation). In addition, as clock speeds increase the total distance that a signal can traverse is limited by the speed of light, roughly a foot per nanosecond (be glad Adm. Grace Murray Hopper isn’t around to hit you with a microsecond!)
    In fact, Moore’s law is how you can pack in more cores and cache.
    2. 8-Core machine != 8 machine cluster. It’s the nanoseconds again. The bandwidth between machines or even between CPUs and GPUs is much less than the bus between cores. (Hypertransport bus 80Gb/s, PCI-e 16x 8 Gb/s, Gb Ethernet 1 Gb) Roughly an order of magnitude difference at each stage. Toss on shared memory and you have a radically different programming regime. It doesn’t mean the programming techniques are mutually exclusive, just that the critical tricks are way different.
    3. True and a good thing, but each case is radically different. In any case the cost of entry has been way down for years, Booting a CHAOS disk in each node of an computer gives you a handy little Beowulf cluster with the users none the wiser.

    Accurately describing these three points actually bolsters your arguments.

  • Douglas Husemann

    Glad to see some people remember what Moore’s law stated. and will continue at least until 2020. Speed was a nice by product of making transistors smaller at the expense of heat. as chip densities got larger, heat became more of an issue thus limiting speeds, although some of the new transitor designs coming may alieviate some of that and perhaps will see a couple of speed bumps along the way. perhaps up to 5 GHz without major cooling expense (probably in the 40 nanometer tech range.

    the rest is a by product of adding cores and is an interesting problem, that as it becomes mainstream will get a solution.

  • I agree wholeheartedly with Doug Kaye above

    Even though I don’t like his argument characterising me as a application programmer. I should get over it.

    Application programmers seem to be the lions share of the programmers ‘in the wild’ and so most parallelisation efforts will be most usefully tasked with improving their productivity. Also to be mentioned is that they tend to have small installed bases, be sloppy and often ignore quality and security (donning flameproof overalls).

    There is a cluster of other good posts around this issue and quite a few good ideas. SQL is quite declarative so it is quite paralisable, hence Oracle deploying to clusters. This is one lever that can improve the paralelisation.

    The Smoothspan blog has an interesting take in how a business rule layer can be leveraged to improve paralellisation.
    Google’s Folding@Home on the “Multi-Core Crisis” and Enterprise Software

  • Thinking concurrency, at an application level, is fundamentally a paradigm shift in how we decompose programming problems. A need to think more declaratively, partitioning data and processing into smaller chunks. I don’t think it is harder than anything before, especially if the runtime can take care of many things hence the power of mapreduce and the rest of the family.

    Relational Database theory, with a declarative programming model, introduced a different mode of thinking about data storage and retrieval problems. Moving, for many entrenched in the current thinking of the time, found the mental shift hard. Object Orientation and Component technologies where similar. Maybe REST and Functional models will enlighten us now that there is a different environment and set of problems to solve.

    Moore’s Law is still working it’s magic. The change is how these transistors are being utilised. A monolithic core in lock-step synchronization with a single clock was the norm. To now, multiple cores running with one, two, maybe more clocks or eventually no clock at all. A complementary law would be…“the number of cores per integrated circuit for minimum cost will double approximately every two years.” Given current issues around power consumption, I’d expect clock speeds to remain constant or maybe decrease a little.

  • The concurrency design in Eclipse is interesting.

    I read a couple of books on the subject and it seems as if the GUI update is entirely separated from the background tasks.

    This sounds a good way to start to create a infrastructure that supports massivly parallisable GUI applications, which seems to be quite a tough problem at the moment.

    As Eclipse is a platform, it can take over from the Microsoft Hegemony in this area, I like to think it can start to make a difference.

    I don’t have enough experience to evaluate the decisions made in Eclipse, yet there seems to be a lot of effort in the framework in the area.

  • Alexandre Richer

    Wha? No Alice ML, no Mozart/Oz, no CTM? You read LtU, yes?

  • Making concurrency easy to work with was the fundamental design goal in Kamaelia. It’s been mentioned here before, so I’ll leave it at that. Personally I think we’ve succeeded, and the approach works and scales. Very difficult to get people interested in a technology though. Works for us though :)

  • I worked in a language called Occam on transputers in the late 80s. The problem then as now is that there are very few algorithms that are amenable to parallelization. It worked beautifully for something like the various numerical methods for solving partial differential equations, since many of these were inherently parallel: with a purpose-designed system like Occam/transputers it was simple to create parallel independent processes.

    Asynch message-passing helps with avoiding threading entanglements, but it’s not a model that fits many applications. As Tim Bray says, “we’re trying to get to a place where ordinary application programmers’ code naturally and effortlessly takes advantage of multi-core, multi-processor, clustered, or otherwise parallel hardware deployments. ”
    That means sufficient machine (compiler) intelligence to detect what parts of the process can be parallelized: an order of magnitude harder than just (!) writing threaded code..

    Folding@home is interesting but I am as yet unconvinced that this can provide the day-job application programmers’ code with the ability to really exploit parallelization.

    So yes I agree, it’s both hard and interesting, if I didn’t have a day job myself and a family to raise, I’d like to spend more time on it..

  • The problem with Occam really is also that it’s too fine grained for the average developer. We still tend to want to think of sequential blobs that do relatively well defined things but from an occam perspective much larger blobs. Erlang is the other extreme IMO – as a high level functional language taking it the other way. Both have the same basic idea – blobs of functionality talking to each other.

    Kamaelia is kinda halfway between the two – which means you can write a stub program that does the basic thing you want, turn that into a component and replace its in/out with inboxes and outboxes and just wire in. It’s also purposely designed with the functionality of other languages in mind. (After all Kamaelia is just a bunch of python modules). The C++ proof of concept shows that the approach is portable.

    The really nice thing though is because use concurrency by focussing on composing functionality, the programmer’s job is simpler, not harder.

    A torrent publisher created by a user of kamaelia (not us) is described here. They wanted a mechanism to publish files by bit torrent, and found Kamaelia an appropriate way of doing this. It has over 14 subsystems which can operate concurrently, and naturally fits together that way. The diagram shows the code structure, it’s not a theoretical class hierarchy/etc. It shows you what’s actually happening. The concurrency aspect comes for free when you think of composition of functionality. (even coarse grained)

    We have a whiteboarding application that allow people to connect whiteboards as peers, shareing the surface, pages it contains and audio heard at each location, to all participants (mixing the audio intelligently). ( Code, Article from Linux format ).

    The nice thing about that code is again, it was all written from the perspective “I want to write a GUI, networked, audio sharing application, and I’ll use these components to do it”, not “I want to use concurrency”. The concurrency aspects enhanced modularity and ability to grow organically, rather than hindered. When I ran our introspector over it, I discovered that it had over 130 components running in parallel. (much more coarse grained than Erlang or Occam, but then that’s expected, and at the moment 130 core CPUs aren’t common, and this was a simple app)

    As a result, I don’t believe concurrency has to be hard anymore. After all, every web mashup takes the same approach. It’s just more coarse grained than people expect. Sooner or later the the web mashup approach will be reapplied at the local app level, and someone will reimplement our work. Which will be cool & hopefully their library takes off :-)

  • Something entertaining is the fact that the interest in concurrency for webapp builders is due to the interactive nature of Web2.0.

  • I suspect we’ll waste cores the way we waste cycles and RAM as Moore’s Law continues to drive their cost down to the point where we don’t bother to optimise.

    For applications that need multiple-core performance, the effort may be worth it.

  • In order to tackle concurrency at the large scale, we have to give the models to developers that allow them to ignore *most* of the issues. The key here is localization of behavior.

    Let me explain. In all stack-based langauges and RPC, there is tight coupling between invoker and invokee. This approach is fundamentally flawed for distributed systems, which also tend to be concurrent. A message based approach is better, but not when it is RPC in guise of asynchronous method calls and the like.

    So, what is the right model? In our software, we are embracing the REST architectural style. Since it builds on lessones learned from HTTP and hypermedia systems, it’s fundamentally distributed. In REST, resources (e.g. services, documents, etc.) take center stage. Developers who get used to the style stop thinking in terms of ‘methods’ and instead start thinking in terms of resource ‘states’. Acting upon the resources uses a stateless protocol, so order is irrelevant, which is a key to scaling. This kind of mental shift, which may appear minor at first, leads to localized behavior. That is, I can start thinking about solving my problem in the context of my resource alone. I interface with other resources through messages which carry complete operations (not sequence-sensitive parts of an operation). My only duty is to satisfy incoming state-change requests in some bound manner. Couple that with a best-effort approach to converge on stable-states, you get a scalable, resilient application architecture.

    Now, the above applies to large scales systems (think web). So, does it also apply to micro-scale systems like multi-cores? The answer is yes. The overhead of serializing messages from process to process or even withing parts of a multi-threaded process is going to be eclipsed by the gains of fanned out processing. The trade-off is very much in line with other code abstrations that were once considered “overheads,” like stack-frames (yes, there was a time where setting up stack-frames was considered an unacceptale expense!), memory protection, and JITted code.

    In the end, the question isn’t can we take 100% advantage of all the cores; it’s “can” we take reasonable advantage of the cores in an approachable way? With resource-centric programming models, the answer is definitively yes!

  • Is this concurrency issue (with respect to multi-core systems) really something to worry about?

    I am typing this in on a dual-core processor. In the background, a score of other processes are also running (though many are admittedly idle). In all likelihood, these processes are running on either core and are totally oblivious to the other processes around them. Would additional cores or better concurrency help me? Probably not most of the time. My browser wouldn’t run any faster (or appear to, at least).

    Further, for a large number of end-user applications, those processes spend a significant amount of time idling while waiting for relatively slow user input (or other events, such as network packets). And that idling is fine because there are the score of other processes in the background that are somewhat less idle (and they still don’t generally tax the processor/cores).

    Now, that’s not to say that concurrency isn’t important. I just think it’s more of a concern for niche products. Google handling millions of asynchronous requests and indexing vast volumes of data is a good example. Real-time simulation of complex interactions is another good example. I’m sure there are plenty of other good examples, but… there’s only so much concurrency that the current word processors, spreadsheets, web browsers, mail clients, etc. can make use of (unless people themselves become more concurrent).

    In the end, I suspect that the sort of concurrency that we are likely to struggle with is that of autonomous systems networked together and used by multiple people simultaneously (e.g. the Web), not of multi-core systems running a single operating system by a single user.

  • I’m also an Occam programmer from the 1980’s, but I also coded a few thousand lines in the latest version: Occam-Pi a year or so ago. I built a simulator for large scale peer to peer web services that could happily run over 100,000 nodes (as separate threads) on my laptop.

    The most interesting thing about Occam is also the most frustrating thing. The language is designed such that the compiler can statically check that your code is using concurrency correctly and it prohibits or manages side effects explicitly. The runtime can also detect dynamic problems like deadlock. Its really hard to get code to compile, you have to unlearn a bunch of bad habits, but when you do a large class of concurrency related bugs can’t happen.

    Occam-pi is available from the University of Kent, look for KROC. It runs on Intel architecture Linux, MacOS and Cygwin. Many years ago Occam was popular for teaching parallelism to computer science students, I think it would be good to get back to that….

  • Kaveh

    Multicore architecture will hit us one way or another. And we must solve new problems. Some say that not all programmers will need the power of multicores and also they are not using even the power of single core powerfull processors. I think this is not a proper point of view. Using multicore power brings new infrastructure that even daily-coder will benefits it ie in UI.
    The bigger problem is a huge bulk of code that appaerantly makes out market valuable and it is hard to admit that a huge part of it is a mess (how many projects fail every year? is that because programmers are not well trained? No! that is because our approaches are not proper for problem solving).
    Personally I like Erlang be the first step in parallel world. It is easy to learn, dynamically typed and mature (in reliability aspect not in code bulking (libraries :) )) enough to be used for real tasks. It can be the perl of new age.
    I am training myself and waiting for big bets. Certainly If I have a killer idea (I hope so! But it is rare.) I will go for Erlang.