Sep 29
2005

Nat Torkington

### Algorithms and Data Structures for Flash Memory, Old Age, and Patents

When I was struggling to remember the precise steps of a shell sort in my computer science classes, I often cursed the fact that there were so many different algorithms for any given task. Surely there was no need to keep finding new ways to search and sort, given that we all have effectively infinite RAM and CPU? I was, of course, tremendously naive. I eventually realized that each algorithm has different properties: takes more or less RAM, more or less CPU, behaves better on ordered or randomized data, has different worst cases, and so on. The more you knew about your data and the environment your code would execute in, the better you can decide which algorithm to choose. If all you have is libc's qsort(), then everything's going to look like O(n log n) until you strike the worst case. And you will strike the worst case--Quicksort behaves poorest (O(n2)) when the list you're trying to sort is already in order.

All this is by way of explaining why Algorithms and Data Structures for Flash Memories (Gal, Toledo) caught my eye.

Flash memory has two interesting properties: you can only erase a bit by erasing a whole block of memory, and you can only erase a finite number of times--the memory physically degrades with each write. I remember this being an issue for the designers of systems like the bicycle-powered telco for Laotian rural communities. They wanted to use Flash memory as the root filesystem, but you can get a short lifetime for the memory if you inadvertently strike the worst case erase-rewrite-erase-rewrite situation.

The paper is a survey of algorithms and data structures that are tailored for these unique characteristics of flash memory. You shouldn't, for example, overwrite data--instead, allocate more and free the old for later use once you've run out of not-yet-written-to memory. This goes against the efficiency sensibilities of modern programmers--if I have the string "Python" and I want to change it to "Perl", most everyone would say to overwrite instead of allocating a new string. But that'll burn out your Flash memory. So a lot of the paper is devoted to filesystem allocation tracking, which is a bit like the memory tracking that malloc() does behind the scenes.

I was surprised to learn that a lot of the Flash-specific algorithms and data structures are patented. I hadn't realized that it was considered okay to do that in computer science circles. In my day (he says, puffing on his Meerschaum pipe and rubbing his old war wound) it wouldn't have been cricket to patent something like mergesort, which is obviously highly tailored to tapes. Imagine if Hoare, Shell, Dijkstra, Kruskal, or any of the other names had patented Quicksort, shortest path finding, or any of the other techniques that are the bedrock of modern systems? I know I'm late to the indignation game, but I hadn't realized that computer scientists played the patent game. Foolish me, I'd thought that crappy patents were the province of business.

The article's conclusion makes a forceful point about the danger of publishing via patent:

Unfortunately, many of these techniques are only described in patents, not in technical articles. Although the purpose of patents is to reveal inventions, they suffer from three disadvantages relative to technical articles in journals and confer- ence proceedings. First, a patent almost never contains a realistic assessment of the technique that it presents. A patent usually contains no quantitative comparisons to alternative techniques and no theoretical analysis. Although patents often do contain a qualitative comparisons to alternatives, the comparison is almost always one-sided, describing the advantages of the new invention but not its potential dis- advantages. Second, patents often describe a prototype, not how the invention is used in actual products. This again reduces the reader’s ability to assess the effec- tiveness of specific techniques. Third, patents are sometimes harder to read than articles in the technical and scientific literature.

tags:   | comments: 3   | Sphere It
submit:

[09.30.05 10:17 AM]

As far as I know (I've been out of the game for a few years, so I could be wrong on this) the actual as opposed to in-theory legal status of patents on algorithms -- while many such have been issued and there has been, on the part of some patent holders (e.g. Unisys' vigorous pursuit of their \$20K LZW license) some use made of threatened enforcement to exact compliance prophylactically -- has not yet been tested in court of law. The patentability of algorithms as a class could still be overturned by a single enlightened decision. So far, it seems that no-one involved has wanted to risk forcing a decision one way or the other, thus remains a rather grey and murky area of law.

It depends of which qsort implementation you use. If you use one that uses the media partition of three you won't hit the n^2 problem. Also, I believe the bad versions of qsort only hit n^2 when it's reverse sorted.

Yes, a naive implementation of quicksort will choke when the list is already sorted (or nearly sorted). See Wikipedia on the subject. I think most libc qsort() implementations randomize elements before sorting, to minimize the probability you'll strike worst case behaviour. Of course, I took that class over a decade ago, and I have trouble remembering where I left my wallet ... :-)

Type the characters you see in the picture above.