Previous  |  Next

Tue

02.20.07

Tim O'Reilly

Tim O'Reilly

A Fast Wait-Free HashTable

An upcoming guest lecture at Dennis Allison's class at Stanford this coming Wednesday (tomorrow) sparked some interesting back-channel discussion among the Radar team. The lecture is by Cliff Click of Azul Systems, and is described as follows:

A Faster Wait-Free Hash Table. I present a wait-free (lock-free) concurrent Hash Tableimplementation with better single-thread performance than most Hash Tables, and better (usual much better) multi-thread performance than all other implementations I tried. I demonstrate scaling up to 768 CPUs even with high mutation rates. I show correctness by looking at the problem in a very different light than the usual "happens-before" / memory-order / fencing style of thinking -- indeed the algorithm requires no memory-ordering, no memory fencing, no D-CAS and no CAS-spin-loops.

A second talk, time permitting:

Scaling Up A Real Application On Azul. A short case study of a Java application doing device driver model checking. The basic application uses many small independent tasks; doing any one task may produce more tasks. The work is highly unstructured and very fine grained. In theory, this coding style should be endlessly scalable but in practice there were a number of issues mostly due to the fine-grained nature of the tasks. The initial attempt used only a handful CPUs on Azul, but after 3 or 4 small modifications we got all 384 CPUs cranking away - and ran the job 45x faster than the fast P4 it was running on. Time to complete a proof dropped from *weeks* to *hours*.

I'll demo the tools we used to do basic performance analysis, and some techniques to break scaling bottlenecks.

I asked the Radar team whether they thought this was worth blogging despite the fact that it's a local event that could be attended by only a small fraction of our readers. The discussion was worth blogging in and of itself.
Nat Torkington reminded me about his recent Radar post, Threads Considered Harmful, in which he wrote:
if you want your program to run faster, the days of buying faster hardware are coming to an end. Instead, we're looking at a time when you must make your program run faster on more (slow) hardware. Enter parallel programming, clusters, and their hyped big brother "grid computing".
In his email response to my query, Nat went on, in telegraphic style:
The tagcloud we want includes a lot of parallel / cluster posts. Why we're tracking it--the end of Moore's Law => individuals CPUs don't get faster we just get more of them => parallelism increasingly important => Google, creators and main consumers of MapReduce, just an extreme of this.
Roger Magoulas added:
I want to add to the chorus of folks who think parallelism is and will continue to be one of the fundamental challenges and directions for computing over the next few years - or maybe longer. With Intel showing experimental 80 core chips and Google engineering their own multi-core platforms, parallelism has already become a big R&D focus. I know Intel is investing major funds (as is Sun) into efforts that make it easier for programmers to take advantage of the parallelism in multi-core chips. The type of under-the-covers advances Intel is funding will be necessary to take full advantage of the coming increases in the number of cores.

In the meantime, there are applications, like search (e.g., Google) and databases, where a 'managing agent' can coordinate parallel activity or direct a process to the appropriate 'worker' agent. and virtualization that can take immediate advantage of large scale multi- core CPUs. While something like Greenplum is geared towards x86 cores, it's probably not a stretch for them to convert to the coming multi-core architectures. For example a 3U box with 4 Tb of usable disk storage and 2 dual core CPUs means each core handles 1 Tb of data. The same box with one 80 core CPU equates to 75 Gb per core.

Applications that use linear algebra - pattern recognition and categorization - can also benefit from large scale multi-core CPUs as large matrices can be broken up into pieces and processed in parallel.

There are so many interesting implications and changes coming that we should continue to pay close attention. Here's a link to a site chock full of information about parallelism and a link to a white paper from Dave Patterson and his colleagues at UC Berkeley that helps frame the challenges ahead for large scale parallelism. The seven questions in the white paper could help define a session track [at one of our conferences].


tags:   | comments: 11   | Sphere It
submit:

 

0 TrackBacks

TrackBack URL for this entry: http://orm3.managed.sonic.net/mt/mt-tb.cgi/1780

Comments: 11

Greg Wilson   [02.20.07 06:53 AM]

The real challenge today is the same as it has been for twenty-five years: figuring out how to _debug_ massively-parallel programs. Explicitly-threaded models don't scale mentally, declarative models fail because the abstraction is leaky, and hardware manufacturers persist in not providing on-board support for data collection.

Dennis Allison   [02.20.07 10:01 AM]

Cliff Clicks talk, A Faster Wait-Free Hash Table,
will be available in real-time on the web on Wednesday for people who cannot attend in person. It starts at 4:15PM pacific and can be viewed on-demand a couple of hours after the talk completes. You need Windows Media Player to view the videos. Slides are usually available in PDF format to download as the video resolution is not great.

There is a back list of other videos that may also be of interest.


Thomas Lord   [02.20.07 10:31 AM]

I am also hopeful about the work coming from David Patterson's direction at Cal but not because they have assembled a summary of technical challenges. Rather, I'm impressed by the RAD lab:

Our vision is to enable one person to invent and run the next revolutionary IT service, operationally expressing a new business idea as a multi-million-user service over the course of a long weekend. If so, we hope to create an Internet Fortune one million.


I'm not so sure that massively parrallel programming is particularly mysterious or unusually difficult. At least, I'm not convinced it is any harder than what we already do on systems with few cores.

The crisis we face, in my opinion, is first, a crisis in the training (both in school and on the job) of working programmers, and second, a resulting crisis in our capacity to develop newly needed systems software.

One of the main lessons of Unix was that, even though theory (Multics) can get very far ahead of practice, a concentration on the craft of being able to do simple things well, at all levels of the software stack, wins. That is why, for example, it seemed prudent for a brief time in the 1980s to arm students with the knowledge and experience to not only use but even to build from scratch elements such as assemblers, compilers, operating system kernels, "tiny languages", and so forth. It was less important, in the circles I traveled, to make sure every student turned into a Perl expert -- more important that any student could pick up and use (something like) Perl or, alternatively, implement such a thing themselves. A very high premium was placed on demonstrated understanding and skill of fundamentals, not on readiness to win speed competitions for translating business logic into PHP or cutting and pasting ad hoc, dubious quality UI elements into web pages.

As a result, our industry has produced a generation of employees, and an upcoming generation of students, whose greatest strength is scripting. They expect ready-made solutions in the form of libraries and off-the-shelf components which they don't understand particularly well yet trust, for some reason, quite a bit.

What is the driving force behind this decadent period in the progress of CS? From over here, it looks micro-economic at the low-level, resulting from macro-economic mistakes at the high-level. Investment capital learned that a lot of flashy hacks were suddenly "easy" because of the new W3C standards and created a massive tide of demand for compliant line-coders. Following nothing more (or less) than outmoded theories of marketing and capital flow, these investors ordered up the construction of a massive system of population monitoring and control known as Web 2.0.

We are now at a dangerous cross-roads, as any serious examination of Google can demonstrate. If the pattern continues, we can expect to see a small priesthood of cloistered technocrats building closed platforms whose primary form and function is social control within a panoptic framework, employing an army of intellectually disarmed "workers" to amplify their insights and access to capital.

Simple forms of engineering ethics: refusing certain work, insisting upon software freedoms -- are relatively powerless against this situation, at least if that is the only resistance there is. It only takes a few smart jerks in the room to form the priesthood and software freedoms do not, in and of themselves, shutter the possibility of bunker-style development and deployment of computing such as we see at Google.

If there is such a thing as a sense of responsibility left among the thought leaders and investors in our industries, then now is the time for that group to collectively step back and publicly contemplate (and then openly and peacefully operate on) new computing technologies as what they are: a new, potentially highly dangerous, inherently elite computing capacity that ought only be deployed under conditions of wide understanding and monitoring, and only for goals that are believed to be socially responsible from many different perspectives.

Programming, most of the time, is quite easy. Spending lots of money virtuously is not.

-t

rektide   [02.20.07 01:13 PM]

hashtable == associative memory. distributed hash tables power all the p2p nets. any advancement in concurrent hash tables is an advancement for free and hyperlinked information.

dennis thank you for the information on how to view this probably very enlightening talk. i'm a little annoyed i have to remember to view it before it expires, i still havent written my google calendar time management app, but limited online viewership opportunities are still better than no opportunity.

Thomas i agree highly. I really think everything we've built is ripe for being torn down and gutted by fundamentally restructured dataware from someone with the balls to do it and a little understanding of how computers actually work.

i suggest you pick up an old book Data Trash. it takes the information revolution and points out that most people will just end up being roadkill on the information superhighway. although the whole point was criticality of an information society, its still written in the same high minded rhetoric that littered early information proselatyization & VR lovefesting, but there are some very powerful sections that have so far described quite well the ascendancy of the current techno-consumerist priesthood fatcats.

frankly i think blaming our situation on training is besides the point. our industry demands extremely rapid development, and the way its chosen to do that is via purchasing other peoples solutions & products. our training accurately reflects the business world's desires, it just so happens that the business world demands extremely little technical comprehension or innovation.

tim, can you bug someone to fix commenting preview such that it uses the same markups as posting? to start, post does a s/\n/
/ and previewing does.

Thomas Lord   [02.20.07 02:09 PM]

our industry demands extremely rapid development,

If by "our industry" you mean, at most, a couple of thousand thought leaders who influence a lot of investment capital, then I agree. Of course, now that they've gone down that course, if we want to fix things, we shouldn't waste time.

Consumers generally pick options from a menu. Faith in "latent demand" is a disavowel of responsibility by those running the kitchen.

As for training: I don't blame the educators so much as the employers.

-t

Anonymous   [02.20.07 05:32 PM]

tom, two comments.

i have qualms with your brand of cheap consumerism. theres a fetish for railing against the consumers for accepting crap, but no one has ever given them the power to think abstractly or to envision anything more powerful. open source ALWAYS starts DIY (be it just burning the debian disk in the first place) which is a jump someone who a consumer by definition does not want to make. instead of trying to expose our systems with grace, we have the application: it is a configuration of libraries configured to produce a behavioral process for the user, a scripted human computer interaction. i'd state than any prognostication on the matter of future consumers is just that: prognostication pulled out of "narrow pipelines" (aka orifices), as so far our net expeience in consumer-facing dataware that has been on anyone's radar is limited to scheme and appletalk. to claim that consumers wouldnt know what to do with real power if it hit them in the face is nonsensical with such a limited domain of case studies and with essentially zero long term studies. its really a very existential question, to what standard do we demand participation in software? is the goal of software simply to be usable, or are we permitted to introduce complexity and abstractions for goals of being imminently manipulable? if users begin demanding complexity, what ethical responsibility do we retain to those who either dont want to adapt or are unable?

going back to your original post, the notion that the people at the helm now would realize how shitty the current system is and then decide to make it all better is laughable. better would be free, better would be distributed, better would be new code environments. better would put all the people running the show out of buisness, and right now would factorially increase time to market. there is HUGE money to be made in converting 98% of the programmers in the world into script monkeys and SOA-dependants. honestly i think the only ones with any interest at all in saving us ARE the googles, because such companies exist only as a network externalities, and the biggest threat to their business is vendors and and code platforms. the more data on the public networks and not being dealt with by your SOA vendor, the more chance google has to not die.

Thomas Lord   [02.20.07 08:30 PM]

going back to your original post, the notion that the people at the helm now would realize how shitty the current system is and then decide to make it all better is laughable.

Frankly, based on my observations, I happily disagree.

To be sure, the people at the helm frequently exhibit an irrational optimism about many technologies (a habit that one can easily learn to overcome) and an equally irrational pessimism that too often leads them to fall back on "maximize short-term profit because if we don't do this someone else will," (a habit which I simply challenge them to overcome by careful experiment).

At the same time, if these same people were actually intractable louts then: Red Hat would be an irrelevant 5 guys in a basement maintaining a crappy GNU/Linux distro; or the Fedora project would be competing head-to-head with Ubuntu to con as much volunteer labor out of noobs as possible; or Cuba would not have been recently persuaded to properly regard software freedom as a socialist imperative; or Sun would not be open sourcing Java; or hardware-based DRM would have arrived and won, years ago, as it has on cell-phone networks; or patent reform at the Congressional level would be entirely in the wrong direction; or GCC would have far greater weaknesses than the mere fact that is old, big, and unique; or we'd all be using DBX, not GDB, at least those of us who were permitted to program; or else there would be no OLPC project; or else I could not afford a reasonably up-to-date operating system; or else there would not be many thousand (I won't make any specific guess) programmers, around the world, who learned mostly on their own and nevertheless correctly recognize their ability to make a huge difference in the world; or else there wouldn't be N+1 libre implementations of the Scheme programming language; or else many thousand people would not be contemplating IP law and economics; or else no capital would have been forthcoming (properly deployed yet, or not) to perform triage scanning on the collections of major libraries; and on and on (I could go on all night, if you like -- could be a fun drinking game).

No, it is precisely the small numbers of major influencers, their having got things "roughly in the right direction", and their (even if at times a bit too reluctant) dedication to an Enlightenment culture that makes any of my criticisms worth rehearsing in a forum like this one. Sorry, but, much as I hate the bastards, I like and respect them, too. It is precisely the degree of their success-so-far that makes it important to try to focus their attention on the screw-ups in their belief systems as they stand. Really, the whole thing is a (very backhanded) compliment, at least if "they" choose to regard it so.

And of course, who am I to climb up on such high horses in the first place? I'm no moral authority in complex issues like this but, at the same time, I have been around long enough to see some common fundamental errors play out and therefore, learn to raise reasonable objections, such as I've been doing.

-t

rektide   [02.21.07 12:44 AM]

open source lacks the cohesion to form a total platform capable of elevating programming past the level of being an esoteric art.
nothing else matters. someday (far far away) someone will have to spend huge money revinventing code environment to not suck egg.
end of story.
dont get me wrong TLscheme and sharpdevelop and monodevelop, its all wonderful,
but never going to make code imminently available to the world.
the cruft will keep building upwards until someone can put together a better bigger picture that represents the processes of computing better.
and i see no indications open source can get focused enough to get in gear and innovate its platform in that higher level direction.
and i see no indications the vendors intend to give developers that pervasive ubiquotous system control
until then its a million kids like me and you cranking away in basements hoping we put together something just crystalizingly defining.
and twenty million open source web app framework dejures.
and plenty of fedoras to sell indulgences.

"If the pattern continues, we can expect to see a small priesthood of cloistered technocrats building closed platforms whose primary form and function is social control within a panoptic framework, employing an army of intellectually disarmed "workers" to amplify their insights and access to capital."

you slam the web but if theres one place technical meritorcacy did win out it was through the web and all the open source channels, floodgates opened, that fostered its growth. consider that there've been two web revolutions in ten years time funded by probably less than the sum of IBM MS Oracle and SAP's middlewares expenditures. http based protocols have demonstrated technical continuity and advancement in progressing from http to rss to atom, and if WHAT-WG actually exists or does anything, the web could perhaps be a viable higher level platform rather than a mess created out of html.

you seem to want to claim that the web enterprises that have sprung up are somehow more nepharious than the middleware people. certainly the middleware people are realizing the benefits of openness and and how critical a high enough degree of it is to their buisness, but to say that these people, the virtual popes of software, are somehow going to go strip the priesthood of their cloth and start passing out bibles to the pews is unreasonable. vendors will certainly concoct some very powerful concurrency potions, but they will be no more accessible than the web apis that are becoming dejure. systems will still be black boxes and magic toolsets, and those conditions are the definition of the script monkey habitat. there is incentive to invest in a platforms power and technical steerings, but theres a long long way to go before they have to start opening their platform and start making code & internals more accessible to everyone, and not just a handful of oracle wizards. red hat and suse are wonderful examples of just that: their complexity management solutions, the point of their buisnesses, are the proprietary rhel and sles. vendors make money by providing cutting edge services. i appreciate your sentiment that we need to watch ourselves carefully though these stages. but merely calling for vendor self awareness and Enlightened technocracy from vendor overlords is not nearly drastic enough. i've stated numerous times i do NOT think open source will provide more accessible computing platforms, that they're just as problematic a member of the esoteric brotherhood (largely because they're prone to replicating others work). if neither open source nor the vendors can be counted on to generate new accessible computing abstractions, i'd say we're in for a long dark era of really awful computing environments and systems that will remain so generally unmaintainable that VMWare is making money hand over fist selling people runnable disk images.

Emilio   [02.21.07 07:24 AM]

Well, if you want to make the writing of hyper-parallel data processing applications easier, take a look here:


http://www.pervasivedatarush.com



The framework handles threading, shared memory management, deadlock detection, horizontal/vertical data partitioning and pipeline parallelism and more...



Similar to Parks work and Flow Based Programming (FBP), but in this case it's a hardened Java framework that is already in use at the US DoD and financial institutions doing very parallel data crunching...



Developer SDK and developer runtime is free. And if you want free production licenses, just use the Contact form on the website and we'll talk. I'm looking for more lighthouse accounts...

Thomas Lord   [02.21.07 11:05 AM]

Rektide,

open source lacks the cohesion to form a total platform capable of elevating programming past the level of being an esoteric art.

You're in intellectual trouble right from the start with statements like that. What does open source mean? Taken literally, it only means software that uses a certain form of public licensing. You'll have a very hard time drawing conclusions about "form[ing] a total platform" (whatever that means) from that. Apparently, your argument is economic?:



nothing else matters. someday (far far away) someone will have to
spend huge money reinventing code environment to not suck egg. end
of story.

So, your argument is that while, yes, we have a largely bogus stack that is widely deployed, replacing it requires a huge investment of the sort that open source business models reliably fail to produce? I'll assume that that is what you mean.

I think you are mistaken, based on my estimates of the technical problems we face and my understanding of micro-economics. The technical problems may well, ultimately, amount to a need for millions of lines of new code, probably developed at a substantially higher cost-per-line (or whatever rough metric you like) than code we use today. I expect some improvements in code density but nothing so profound that we ought to assume a cost discount from it. Yet, there is no evident need for all of that code to appear on the market at once. On the contrary, an incremental approach, leveraging weaknesses in the existing stack, should be able to do a lot of good with much less work between successes. That, finally, brings us to the micro-economic questions: although a total accounting of the investment needed to build a better stack may be high, and the risks may involve difficult odds for investors (especially technically naive investors), what matters from a business perspective is whether R&D can, along its course, bring products to market that genuinely create new efficiencies. That doesn't seem like something one can accomplish as a hobby, in a basement, this far into the game -- but sure looks doable from here.

On the other hand, perhaps when you say that open source "lacks the cohesion" you mean that any attempt to rally the crowd of volunteer and hobbiest hackers is likely to fail. If that is what you mean, I hope you are correct. That form of volunteerism is energy that people spend for political reasons, for their own pleasure, and for their own personal development. It is not a natural resource for exploitation by business -- on the contrary, it is an opportunity and freedom that responsible businesses should be thinking about how to preserve and protect while minimizing interference. In any event, it is more than just a moral choice: the "Magic Cauldron" and similar understandings of the utility of that energy for achieving business ends are simply false, as one can see in the smart choices that several of the large distribution vendors make these days -- choices to neither rely upon nor cheer-lead for the volunteers, only to permit them access to the source and listen carefully to what they say, leaving it up to employees to interact directly with those communities only as a permitted use of "paid personal time" and with no incentives at all for collecting gratis labor.


I wrote:
If the pattern continues, we can expect to see a small priesthood of
cloistered technocrats building closed platforms whose primary form
and function is social control within a panoptic framework,
employing an army of intellectually disarmed "workers" to amplify
their insights and access to capital.

You responded at length so let me respond in pieces:

you slam the web but if theres one place technical meritorcacy did win out it was through the web and all the open source channels, floodgates opened, that fostered its growth. consider that there've been two web revolutions in ten years time funded by probably less than the sum of IBM MS Oracle and SAP's middlewares expenditures. http based protocols have demonstrated technical continuity and advancement in progressing from http to rss to atom, and if WHAT-WG actually exists or does anything, the web could perhaps be a viable higher level platform rather than a mess created out of html.

I do not slam "the web" -- my aim is a little bit more precise than that. W3C more or less invented the possibility of modern "middleware", brought us a lot closer to sanity in the worlds of global-scale human-to-human communication, took the wind out of the sails of a lot of bogus approaches to user interface development (by offering an "70%-solution" alternative), and shattered the potential for hegemonic abuses (thus rescuing!) the IETF process. I think we have a few years to go before it is more widely appreciated how much they've helped everyone.

However, technology is always a two-edged sword, as is capital. Given the powerful set of building blocks web standards gave us, where did capital flow? The largest successes (e.g., Google) didn't walk, they ran to build a bunker, loaded with more compute power than anyone else, and to deploy it as a system of privatized social and individual marketing, presented as seductively as possible, tuning the resulting Orwellian 2-way TV into a tool mainly for manipulating the consumption habits of their customers. In the case of Google, they seem to have achieved a remarkable degree of non-critical respect-paying, in spite of of their secrecy, in spite of of their hoarding of useful data, in spite of their predatory behavior in markets, in spite of of their attempt to become a brain-drain -- and why? because they had a very successful IPO, some Stanford cache, a catchy slogan, and the fanciest cafeteria in the valley. So arrogant is the resulting firm that its approach to copyright reform (in the matter of scanning books) is to simply and flagrantly violate federal law in order to gain a commercial advantage.

The concept of the panopticon refers to a design for a prison. In the prison, all of the prisoners are held in a circle of cages, with inward-facing bars. In the center of the circle is a tower -- an observation post that can see out to the cages, but into which the prisoners have no clear view. The prisoners, as in Orwell's 1984, never know when they are being watched, or by whom -- only that it is never safe to assume they are not being watched. One should recognize an uncomfortable resemblance between the panopticon, so conceived, and two things: Google's relation to their users, and Google's relationship with their employees, including their temporary employees.

There is no technical or business necessity that requires an effective deployment of anything like page-rank to have these panoptic properties.



you seem to want to claim that the web enterprises that have sprung up
are somehow more nepharious than the middleware people.

Actually, I haven't made any such comparison and do not, at this time, express any opinion about it. However, you go on:

certainly the middleware people are realizing the benefits of openness and and how critical a high enough degree of it is to their business, but to say that these people, the virtual popes of software, are somehow going to go strip the priesthood of their cloth and start passing out bibles to the pews is unreasonable. vendors will certainly concoct some very powerful concurrency potions, but they will be no more accessible than the web apis that are becoming dejure. systems will still be black boxes and magic toolsets, and those conditions are the definition of the script monkey habitat. there is incentive to invest in a platforms power and technical steerings, but theres a long long way to go before they have to start opening their platform and start making code & internals more accessible to everyone, and not just a handful of oracle wizards. red hat and suse are wonderful examples of just that: their complexity management solutions, the point of their buisnesses, are the proprietary rhel and sles. vendors make money by providing cutting edge services. i appreciate your sentiment that we need to watch ourselves carefully though these stages. but merely calling for vendor self awareness and Enlightened technocracy from vendor overlords is not nearly drastic enough. i've stated numerous times i do NOT think open source will provide more accessible computing platforms, that they're just as problematic a member of the esoteric brotherhood (largely because they're prone to replicating others work). if neither open source nor the vendors can be counted on to generate new accessible computing abstractions, i'd say we're in for a long dark era of really awful computing environments and systems that will remain so generally unmaintainable that VMWare is making money hand over fist selling people runnable disk images.

We can compare our situation today, along with projected outcomes, to the early days of development of other life-critical, massive, network-structured infrastructures such as the power grid, the highway system, the air travel system, the potable water delivery system, and so forth.

In all of those older systems, having incurred a lot of sunk costs in a fragile and eventually unsustainable solution, as a society we are beginning to pursue radical reform -- and for lots of very good reasons. A certain amount of fragility is just "part of life" and is reasonably offset by efforts to maintain civil order, with violence if necessary. Yet there, exactly, is the rub: too much fragility and law enforcement turns from an orderly, harm-reduction approach to suppressing hooliganism into an all-out, panoptic, Stanford-prison-experiment, eternal nightmare.

The question before the capitalists who invest in the net (and thank goodness they do!) is now: what will you do to contribute solutions rather than add new-yet-familiar problems to the mix?

-t

DMCC   [07.06.07 03:12 AM]

"Architectural and software innovation" design planned for threaded cores.
You certetainly GOT THE F.. POINT!"!!

Most software around inside unixes even,
is not threaded, and the threads will change a lot
since they need to be fixed in order to have
the "shared data" inside the model of the caching architecture of the processors.
That means fewer data sharing betwen threads.

I've found that Windoez XP has a huge threading errors in terms of what data is Shared between
the same process. aka explorer.exe which actually
can really stuck when the shared data inside the process is bigger than the size of the L1
and L2 cache of your architecture.

Write once, run everywhere, at every f____ cost.


Post A Comment:

 (please be patient, comments may take awhile to post)




Remember Me?


Subscribe to this Site

Radar RSS feed

BUSINESS INTELLIGENCE

CURRENT CONFERENCES