A Fast Wait-Free HashTable

An upcoming guest lecture at Dennis Allison’s class at Stanford this coming Wednesday (tomorrow) sparked some interesting back-channel discussion among the Radar team. The lecture is by Cliff Click of Azul Systems, and is described as follows:

A Faster Wait-Free Hash Table. I present a wait-free (lock-free) concurrent Hash Tableimplementation with better single-thread performance than most Hash Tables, and better (usual much better) multi-thread performance than all other implementations I tried. I demonstrate scaling up to 768 CPUs even with high mutation rates. I show correctness by looking at the problem in a very different light than the usual “happens-before” / memory-order / fencing style of thinking — indeed the algorithm requires no memory-ordering, no memory fencing, no D-CAS and no CAS-spin-loops.

A second talk, time permitting:

Scaling Up A Real Application On Azul. A short case study of a Java application doing device driver model checking. The basic application uses many small independent tasks; doing any one task may produce more tasks. The work is highly unstructured and very fine grained. In theory, this coding style should be endlessly scalable but in practice there were a number of issues mostly due to the fine-grained nature of the tasks. The initial attempt used only a handful CPUs on Azul, but after 3 or 4 small modifications we got all 384 CPUs cranking away – and ran the job 45x faster than the fast P4 it was running on. Time to complete a proof dropped from *weeks* to *hours*.

I’ll demo the tools we used to do basic performance analysis, and some techniques to break scaling bottlenecks.

I asked the Radar team whether they thought this was worth blogging despite the fact that it’s a local event that could be attended by only a small fraction of our readers. The discussion was worth blogging in and of itself.Nat Torkington reminded me about his recent Radar post, Threads Considered Harmful, in which he wrote:

if you want your program to run faster, the days of buying faster hardware are coming to an end. Instead, we’re looking at a time when you must make your program run faster on more (slow) hardware. Enter parallel programming, clusters, and their hyped big brother “grid computing”.

In his email response to my query, Nat went on, in telegraphic style:

The tagcloud we want includes a lot of parallel / cluster posts. Why we’re tracking it–the end of Moore’s Law => individuals CPUs don’t get faster we just get more of them => parallelism increasingly important => Google, creators and main consumers of MapReduce, just an extreme of this.

Roger Magoulas added:

I want to add to the chorus of folks who think parallelism is and
will continue to be one of the fundamental challenges and directions
for computing over the next few years – or maybe longer. With Intel
showing experimental 80 core chips and Google engineering their own
multi-core platforms, parallelism has already become a big R&D focus.
I know Intel is investing major funds (as is Sun) into efforts that
make it easier for programmers to take advantage of the parallelism
in multi-core chips. The
type of under-the-covers advances Intel is funding will be necessary
to take full advantage of the coming increases in the number of cores.

In the meantime, there are applications, like search (e.g., Google)
and databases, where a ‘managing agent’ can coordinate parallel
activity or direct a process to the appropriate ‘worker’ agent. and
virtualization that can take immediate advantage of large scale multi-
core CPUs. While something like Greenplum is geared towards x86
cores, it’s probably not a stretch for them to convert to the coming
multi-core architectures. For example a 3U box with 4 Tb of usable
disk storage and 2 dual core CPUs means each core handles 1 Tb of
data. The same box with one 80 core CPU equates to 75 Gb per core.

Applications that use linear algebra – pattern recognition and
categorization – can also benefit from large scale multi-core CPUs as
large matrices can be broken up into pieces and processed in parallel.

There are so many interesting implications and changes coming that we
should continue to pay close attention. Here’s a link to a site chock
full of information about parallelism and a link to a white paper
from Dave Patterson and his colleagues at UC Berkeley that helps
frame the challenges ahead for large scale parallelism. The seven
questions in the white paper could help define a session track [at one of our conferences].