Algorithms and Data Structures for Flash Memory, Old Age, and Patents

When I was struggling to remember the precise steps of a shell sort in my computer science classes, I often cursed the fact that there were so many different algorithms for any given task. Surely there was no need to keep finding new ways to search and sort, given that we all have effectively infinite RAM and CPU? I was, of course, tremendously naive. I eventually realized that each algorithm has different properties: takes more or less RAM, more or less CPU, behaves better on ordered or randomized data, has different worst cases, and so on. The more you knew about your data and the environment your code would execute in, the better you can decide which algorithm to choose. If all you have is libc’s qsort(), then everything’s going to look like O(n log n) until you strike the worst case. And you will strike the worst case–Quicksort behaves poorest (O(n2)) when the list you’re trying to sort is already in order.

All this is by way of explaining why Algorithms and Data Structures for Flash Memories (Gal, Toledo) caught my eye.

Flash memory has two interesting properties: you can only erase a bit by erasing a whole block of memory, and you can only erase a finite number of times–the memory physically degrades with each write. I remember this being an issue for the designers of systems like the bicycle-powered telco for Laotian rural communities. They wanted to use Flash memory as the root filesystem, but you can get a short lifetime for the memory if you inadvertently strike the worst case erase-rewrite-erase-rewrite situation.

The paper is a survey of algorithms and data structures that are tailored for these unique characteristics of flash memory. You shouldn’t, for example, overwrite data–instead, allocate more and free the old for later use once you’ve run out of not-yet-written-to memory. This goes against the efficiency sensibilities of modern programmers–if I have the string “Python” and I want to change it to “Perl”, most everyone would say to overwrite instead of allocating a new string. But that’ll burn out your Flash memory. So a lot of the paper is devoted to filesystem allocation tracking, which is a bit like the memory tracking that malloc() does behind the scenes.

I was surprised to learn that a lot of the Flash-specific algorithms and data structures are patented. I hadn’t realized that it was considered okay to do that in computer science circles. In my day (he says, puffing on his Meerschaum pipe and rubbing his old war wound) it wouldn’t have been cricket to patent something like mergesort, which is obviously highly tailored to tapes. Imagine if Hoare, Shell, Dijkstra, Kruskal, or any of the other names had patented Quicksort, shortest path finding, or any of the other techniques that are the bedrock of modern systems? I know I’m late to the indignation game, but I hadn’t realized that computer scientists played the patent game. Foolish me, I’d thought that crappy patents were the province of business.

The article’s conclusion makes a forceful point about the danger of publishing via patent:

Unfortunately, many of these techniques are only described in patents, not in
technical articles. Although the purpose of patents is to reveal inventions, they
suffer from three disadvantages relative to technical articles in journals and confer-
ence proceedings. First, a patent almost never contains a realistic assessment of the
technique that it presents. A patent usually contains no quantitative comparisons
to alternative techniques and no theoretical analysis. Although patents often do
contain a qualitative comparisons to alternatives, the comparison is almost always
one-sided, describing the advantages of the new invention but not its potential dis-
advantages. Second, patents often describe a prototype, not how the invention is
used in actual products. This again reduces the reader’s ability to assess the effec-
tiveness of specific techniques. Third, patents are sometimes harder to read than
articles in the technical and scientific literature.