Previous  |  Next



Artur Bergman

Artur Bergman

Law is code

Code-is-law, a term originating from Professor Lawrence Lessig, has been on the collective mind of Radar lately. Tim blogged about it, and the last issue of Release 2.0 explored it further.

I found code-is-law on my mind at Foo Camp this year during a presentation on security by Dan Kaminsky. (Yes, the same renaissance hacker that made me fear my web browser last week.) Dan's presentation described how to turn noise into visualizations using dotplots, a technique he uses to guide fuzzers. But ever a true hacker, Dan also created a series of beautiful visualizations ranging from audio captchas to the representation of Zelda.

The visualization that fascinated me most combined code and law, produced using Project Gutenberg, kernel32.dll and US Code.

Project Gutenberg (a collection of 17000 free books) fails to show a significant pattern beyond random noise.
Project Gutenberg

Despite English’s low information content, lack of even mildly related strings causes little self-similarity across symbol clusters.

kernel32.dll (the main piece of code in the Windows kernel)

Binary code (be it bytecode or x86) tends to be very structured. Still, we are dependent on both the content and the compiler to generate distinct patterns.

US Code (the codification by subject matter of the general and permanent laws of the United States)
US Code

Legalese is a massively structured dialect. Symbols appear in very distinct patterns that are more reminiscent of machine code than text.

Quite clearly, we can see that US Code is appropriately named since it has more in common with kernel32.dll than the contents of Project Gutenberg. Not only is code law, but law is code: a highly structured set of instructions that allows a state machine to function, ideally without any ambiguity. Lessig was right!

tags:   | comments: 22   | Sphere It


0 TrackBacks

TrackBack URL for this entry:

Comments: 22

Gordon Mohr   [08.06.07 12:41 AM]

Perhaps someday we can make legislative law subject to a complexity budget: say, 1MB, max.

Or at least the tax code. And given the importance of tax-filing software, why risk the software and the official rules deviating from each other? Make the software itself -- still within some complexity budget -- the canonical expression of tax law.

Anonymous   [08.06.07 01:58 AM]


We have that nightmare already in several areas. For example:
Some spouses of EU citizens need to apply for a visa to travel to the UK.

The EU has a facilitated visa process which is free and requires minimal documentation.

Non EU spouses have a paid visa process, which costs a lot of money and requires more documentation but allows for longer visas and multi-entry visas.

European case law is such that you are probably entitled to get a multi-entry visa under the facilitated EU rules (They can't disadvantage you compared to visas for Brits married to foreign spouses).

However the website does not let you ever select the combination of EU Spouse and multi-entry visa.

Alex   [08.06.07 06:22 AM]

What are these graphs actually of? And what program did you use to generate them?

Naveen   [08.06.07 07:44 AM]

What would be nice is if one day laws can be written in a logical way so they can be mapped into a procedural logic format, and simple breaches of the law can be analyzed by a verification system (of course this ignores the human judgement issues pertaining to law). This would probably be the easiest to implement in business and contract law (unless we find some way to map moral issues into procedural logic). Likewise, when Congress drafts new laws, the coded form can be checked against previously enacted laws to point out inconsistencies. Not being a lawyer, I'm probably missing some immediately obvious flaw with the above dream but it seems like it could streamline our legislative and judicial processes.

Rohan   [08.06.07 07:50 AM]

Alex: that's a similarity matrix. You would have to ask Dan himself what he used to produce the visualisation but you can make them with something as simple as a spreadsheet application.

J Gruszynski   [08.06.07 08:21 AM]

An interesting comment made by an entrepreneur I heard speak once was that Law, encompassing laws, contracts, patents and other legal documents, is a form software programming language. To wit:

  • There are variable declarations and initializations in the top where terms and aliases are defined.
  • There are statements, expressions and relationships defined and there are inputs and outputs connected to those.
  • There are references to other existing programs and their variables or execution logic
  • Etc.

The big difference between law and programming is the execution and debug. In law, no law "program" is ever executed by either the programmer (lawyer or law maker) or the end users (users of the documents). Every one involved in the creation of the program only presumes that the program works based on the expertise of the programmer.

The first and only execution comes when there is an argument about what the program actually does. The execution engine is a judge or sometimes a jury. This is strongly akin to prices in economics; no item truly has price until a transaction for the item has actually occurred. It's all wishful supposition of comparable value up to that point.

There are a series of execution engines to which one can appeal the interpretation of a program based on the assumption they are better at interpreting the code (have more powerful computation?). The ability of these execution engines to compute the programs has never been proven absolutely however. Most people accept the supreme execution result nonetheless.

Thinking about law this way puts a completely different slant on things like contracts or laws. When people argue that something "violates the contract" or is "against the law", they are making an incredibly thin and weak argument indeed. The odds are the contract or law hasn't even been executed once for bugs!

Thus lawyers are hackers, only without the benefit of being grounded in the immediate, physical outcome as computer or hardware hackers are - their reality largely exists in there heads as the collective mental interpretation of how the "Big Program" works. They learn about the inner workings of the "Big Program" in law school and get tested on their understanding of it at the bar exam.

Thomas Lord   [08.06.07 08:37 AM]

Two quick items:

Item 1: "Law as Code" (and, hey, where's "open source law"?!?)

Replying to Naveen: As you know there are municipal, county, state, and federal codes. Have you ever heard, though, of what lawyers call a "form book" or of, "rules of the court" for each court jurisdiction (e.g., Superior Court of the U.S., Oakland division)? Court rules specify a procedural "API" to the (many) standard processes a court executes -- file such and such a paper, serve such and such a paper, pay such and such a fee, etc. Form books are kind of "template system" for court filings. Both of these are heavily "hyperlinked" to the legal codes. So, to oversimplify a bit, the work-a-day activity of lawyers and their offices is often "just" to fill in those templates and run the steps of court rules. (It's highly skilled work: it's hard to understand a form, for example, unless you first read all of the laws it links to.) One very nice thing about this "programming language" for law is that in exactly the area you worry about -- human judgement -- it treats human judgement as an externally supplied "subroutine call".

As an illustration: a friend of mine recently self-represented in defending a lawsuit. The suit was so bogus that plaintiff was under injunction to drop it -- yet, for reasons of their own, snuck in at the last minute and obtained a default judgement in their favor. A simple matter: my friend just had to show the judge the injunction and have that decision set aside but -- how do you do that? Just walk into court and say "See?" No, my friend had to file a motion -- well, there's a form book that has templates for exactly such a motion. Such a motion has to explicitly explain to the court (the judge) what law it is that enables him to hear this motion and the form book gives some links to laws you might use. (It has to explain other things, to, like which laws permit the set-aside, the facts of the case at hand, etc.) So, then you have to read those hyperlinked laws, pick the ones that apply, and come back and fill in the template. The rules of the court tell you (in considerable detail) how your papers have to be formatted, what filing fees you have to pay, etc. It's very much like "running a program by hand" with the purely human, case-specific details carefully wrapped up in this very precise framework.

I wouldn't worry about (your suggestion of) automating checks for legal inconsistencies in new laws, for two reasons. First, though the various statutory codes are huge, they aren't that huge and they are very modular. U.S. patent law, for example, is something you could read, in its entirety, in a few hours. (It's amazing how many people loudly debate patents without seeming to have ever read any of these laws!) The point is, the code is "small enough" that experts can quickly spot most contradictions, quickly. (Interactions with regulations, as opposed to laws, can be messier, of course, but even there, domain experts tend to abound.) Second, contradictions do happen -- but there is plenty of common law, precedence, and juridical theory generally for handling them on a case-by-case basis. Is it really your impression that there's a major problem in this area?

Finally (for "item #1"): Where is open source law?!? Legal codes tend to be on-line these days, as do rules of the court. Form books on the other hand -- which, in spite of their complexity are vital to self-representation -- are all proprietary documents. You'll either pay through the nose for them or have to use them during a visit to a law library. Some legal self-help books are, in essence, tiny parts of form books translated (with lots of explanation) for lay people but, for anything tricky, it's off to the law library (just like prisoner's in jail).

Writing and maintaining form books -- but for the quality assurance question -- is ripe for a user-created-content kind of open source solution. To do so would make the protections of law vastly more accessible to average citizens. We could set up "WikiFormBookedia" tomorrow and that would be the gist -- but how to edit and validate content?

Item #2: Code is Law

I'm not sure this idea has really sunk in to the larger public. It's not, as Artur jokes, just that code resembles law and vice versa. Rather, it is that the power of (software) code in our economy is such that it creates new realities and shapes old ones that legislatures and courts have a social policy interest in -- but can't keep up with. Worse, the (often necessary) use of particular pieces of software is (often) wrapped up in contractual protections -- one must enter into a contract to, for example, use a web based email service. Those contracts amount to "private legislation" and the technology that is their subject amounts to a "privately regulated market" -- all of these things being essentially self-nominated and often, as we see, sweeping up huge numbers of users into their regimes.

From a libertarian perspective, that might be theoretically fine: if you don't like "Google law" then switch to some other provider. In practice, there really isn't any serious competition in those markets so there's kind of a wild west of private law sweeping up huge numbers of people, formed completely undemocratically, often having very bad unintended social consequences that are, in retrospect, caused by private profit seeking of the providers.


anwaya   [08.06.07 08:57 AM]

First, we need to know better what Artur's methodology is. There are structures and patterns in DLLs which are not artefacts of the underlying code, but are produced by compilation and linking. Is this what he shows us?

Second, to Thomas Lord: In a democratic society, almost all law is Open Source. The Approvals Committee is the Legislature. In the US, if you haven't sought and won election to the legislature, you can seek to influence legislators by a number of routes, some more successful than others - but the end result is published and can be amended at a later date. Closed Source law is practiced by tyrants, who make laws up as they go along without telling anyone.

Thomas Lord   [08.06.07 09:14 AM]


My tortured use of english may have failed me here because you utterly missed my point.

"Form books" (see above) are not legislation -- just writing about legislation. They are an essential enabler of access to legal protection -- you can't very well talk to a court without using one. Yet they are all proprietary. We need an open source version of form books.

"Private legislation" is real, is (as you suggest) tyrannical, and is a big deal. It is encoded in the technical systems and the terms-of-use contracts of a lot of private services (but services so encompassing in scope that they become a de facto "public" marketplace.

Closed Source law is practiced by tyrants, who make laws up as they go along without telling anyone.

You are referring, I guess, to the technologists and businesses behind Web 2.0, as one example.


Ray   [08.06.07 10:19 AM]

I would be interested in seeing how smaller cross-sections of Project Gutenberg would appear on these graphs. It seems to me that the larger the sample, the less detail would show up. Some works surely would look like mere background noise, but I would be willing to bet that some would appear very structured.

Christine   [08.06.07 10:22 AM]

Not that this is the most interesting point here, but I think it'd be great to know what algorithm was used to make these visualizations.

At first glance anyway, it seems a bit misleading to compare the sum of all text in project gutenburg with writing on a specific technical topic, and use this as evidence that "Law is more structured than Text". It seems sort of analagous to comparing the average clothing colors of the audiences attending games in the Big 12 conference with those attending games at Ohio State and coming to the conclusion that "Ohio fans prefer to wear red, while fans in the central states wear a sort of muddy grey brown color".

I think it would be interesting to see what this algorithm does with more specific genres of text. For instance, writing for very young readers (think dr. suess) uses a lot of repeated symbols and constructions. And how does technical writing in other fields such as chemistry or biology compare to writing in law?

Brian Schmidt   [08.06.07 10:29 AM]

Does the fact the US Code is numerically "codified" have anything to do with the visual patterns? Maybe we're just seeing the patterns in the numeric codification, not the content of the law itself.

ABK   [08.06.07 10:40 AM]

Is this really a revelation? Or worth wasting electrons in sharing? Was it worth the analysis? Legalese is structured? Law is code? You needed a visual representation to understand that? What work does this fairly asinine analysis do? This post, and the comments hereto, feel exactly like the worst moments of law school: useless, non-instrumental arguments, that serve no purpose other than to shine the writers intellectual apple. This sort of pseudo-intellectual analysis exists only for its own sake, and doesn't actually aid in furthering any sort of understanding. It is an utter wast of time.

Ben Wisdom   [08.06.07 12:32 PM]

If the U.S. Code is like a computer operating system, does that mean that the U.S. Constitution is like a computer's BIOS? Or is there a better analogy than that?

Carrying this metaphor a little further, does that make corrupt politicians and ambulance-chasing lawyers the equivalent of computer viruses?

This article is confirmation of a theory that I've had since the beginning of the year: the next great governing document (i.e. The Magna Carta, the Mayflower Compact, the U.S. Constitution) in world history that will advance the prosperity of humanity will be written by a bunch of hackers.

Dan Kaminsky   [08.06.07 02:33 PM]

So I'm the author of the images in question. Here's the mechanism used:

Take the data -- Gutenberg, DLL, or US Code -- and separate into 32 byte chunks. Lay these chunks out horizontally and vertically. Now, set the brightness of each pixel as the similarity between the bytes at x vs. the bytes at y.

The actual metric, btw, is the Levenstein string distance, with some normalization.

Code for doing this can be found here:

It really needs to be packaged up better though. You're looking for hardcorr.

I'm working on porting this stuff to WinAMP btw. Here's a preview:

Glad you guys are enjoying!

Chris Vail   [08.06.07 06:29 PM]

I've heard that half the lawyers in the world are in the US. Perhaps that explains the perspective that law is a thing in itself, rather than a means to an end.

For example, there is an English Common Law tradition, much suppressed in the 20th Century US, of the "fully informed jury", who may decide in a particular court case to set aside the law (and thus acquit an otherwise guilty defendent). If one person sets aside the law, he is a criminal, but if 12 additional peers also set aside the same law, then what you have is justice, not law.

In the same way, code is logic, but not intelligence (and it never will be intelligence).

Tyrants are bad because they are unjust, not because they don't tell you the laws.

ralph   [08.07.07 10:09 AM]

why aren't these visualizations in 3-d with sound?

Supreet Sethi   [08.08.07 06:49 AM]

Sounds like a great visual play on search indexes.

John   [08.08.07 10:29 AM]

Anything that contains structure exhibits this behavior. There is nothing surprising about it.

Chester Shiu   [08.08.07 04:08 PM]

It seems very likely that the pattern observed in the US Code simply reflects the fact that legalese does not tolerate synonyms (gets you in trouble during litigation...) Hence, a legal document like the US Code will simply repeat the same phrase over an over again, horrifying any high school English teacher. Given 32 byte chunks, it's unsurprising that close bytes (i.e., same chapter of the Code) will often have high similarity, since they describe the same concept, and the words used would be (nearly) identical.

As for procedural logic, I have often thought this myself, being a former bioinformatician. However, the problem is that most interesting cases require a human judgment that is not readily machine computable. I can only imagine trying to instantiate the concept of a reasonable person of ordinary sensibilities into an ontology. However, even without that, I suspect that procedural logic would do much to frame the debate and, potentially, lead to more readable judicial opinions. That's probably just wishful thinking...

Rafael de F. Ferreira   [08.08.07 06:33 PM]

I'm far from being an expert in the subject, but FWIW, there is ongoing research in formally codifying legislation. Check out Deontic logic.

Dan Kaminsky   [08.13.07 05:52 PM]


The point is that legalese contains far more structure than normal speech, and bears more structural similarities to compiled code than literature.


Yup, that's precisely what's going on. However, you see very much the same pattern in compiled processor code.

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

Subscribe to this Site

Radar RSS feed