Thu

Mar 20
2008

Nat Torkington

Nat Torkington

Disks have become tapes

Via Michal Migurski's delicious feed, I found Disks Have Become Tapes, a fascinating set of observations about why MapReduce is so successful. In short (as observed by Doug Cutting at OSCON last year), MapReduce hits the sweet spot by operating at the transfer rate of disks (growing at 20%/year) rather than seek rate (growing at 5%/year) as relational databases do. It also mentions the growing field of column databases, which we're watching at Radar (e.g., Michael Stonebraker's appearance at MoneyTech) as driven by capacity growing faster than transfer and seek. Absolutely fascinating stuff.


tags: hard numbers  | comments: 3   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/6391

Comments: 3

  Deepak [03.20.08 09:55 AM]

Indeed. The whole distributed FS, Streaming database, H-Stores ... all fascinating stuff

  Thomas Lord [03.20.08 10:00 AM]

"Absolutely fascinating" is an understatement. When I first read the original Patterson/Gray piece in queue (cited and linked in the article noted here) -- I immediately became a better programmer. They are saying lots of obvious things that everyone already knows, but framing it as "so think of it as a tape" adds a nice creamy sauce of lucidity to it.

I read the article some time after it came out. I immediately adopted the idea for work in genomics computing. We had this challenge of trying to search very huge genetic sequences for (approximate) matches of some tiny regexps. The catch, aside from the size of the sequences, was the need to search for thousands, tens of thousands or more regexps, all at once.

Indeed, the only way this was practical was to really tune the "whole machine" -- paying very special attention to bandwidths and latencies across layers of the memory hierarchy. Do I compute two tables and do a join of them on the fly? Or do I write them individually to disk (or, better, the wire) and do the join separately? The latency and bandwidth rough numbers give the right answer, in every case. Indexing, too: a less specific index (leaving more searching to do) can be better than a more specific index if it fits more comfortably in memory or even in L2.

Stonebreaker is doing the Extra Fancy variety where he makes a fully blown query optimizer and everything is all Coddishly clean in separating logical and physical database models (only his engine knows about columns). Our project was shoestring stuff on a short time line so I went the rough and ready route and implemented a column db analog of (a compiled form of) Awk: code out your physical queries by hand but at least there's a high-level language in which to do that. Hey, my approach worked like a charm. (Any scientists in the audience looking for help with genomics coding?)

When we use today's architectures as servers or workstations, it's rare that they get really, really well tuned. Sure, people will fuss with database tuning (c.f. Tim's "DB War Stories" series). But, as configured, most of these machines are just lurching along, kind of thrashing. They're just so fast we put up with the thrashing.

But, what I discovered after reading Patterson / Gray (and, later, a little Stonebraker on columns) -- you *can* still make this generation of hardware "sing". You *can* still write programs that literally resonate with the architecture, usefully saturating the buses and gates with productive work, consuming each layer of resource in *just* the right amount to get the most from it. And there are plenty of real world problems where it pays off.

Neat stuff.

-t

  john allspaw [03.20.08 10:19 AM]

Great stuff, and great comment, Thomas.

One of the main reasons any intense database users are looking at solid-state disks (SSDs) is to reduce the awfulness which is random I/O on standard drives. The laws of physics annoy web operations all over the world because of this.

Drive manufacturers know this, I'm sure, which is why they are racing so very hard to get 20,000 RPM drives out the door.

I'll bet that there will be more creative ways engineered to turn what used to be random I/O into sequential I/O. In which case, 20k RPM drives won't be necessary. Not just yet, anyway. :)

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.