I found code-is-law on my mind at Foo Camp this year during a presentation on security by Dan Kaminsky. (Yes, the same renaissance hacker that made me fear my web browser last week.) Dan’s presentation described how to turn noise into visualizations using dotplots, a technique he uses to guide fuzzers. But ever a true hacker, Dan also created a series of beautiful visualizations ranging from audio captchas to the representation of Zelda.
Project Gutenberg (a collection of 17000 free books) fails to show a significant pattern beyond random noise.
Despite English’s low information content, lack of even mildly related strings causes little self-similarity across symbol clusters.
kernel32.dll (the main piece of code in the Windows kernel)
Binary code (be it bytecode or x86) tends to be very structured. Still, we are dependent on both the content and the compiler to generate distinct patterns.
US Code (the codification by subject matter of the general and permanent laws of the United States)
Legalese is a massively structured dialect. Symbols appear in very distinct patterns that are more reminiscent of machine code than text.
Quite clearly, we can see that US Code is appropriately named since it has more in common with kernel32.dll than the contents of Project Gutenberg. Not only is code law, but law is code: a highly structured set of instructions that allows a state machine to function, ideally without any ambiguity. Lessig was right!