Jul 13

Tim O'Reilly

Tim O'Reilly

Why Congress Needs a Version Control System

I've been thinking lately just how much software developers take the existence of version control for granted, and how, with the notable exception of wikipedia, web 2.0 applications don't offer much in the way of version control functionality. (I don't really count the kind of change-tracking that is offered in online office programs, especially ones that offer "accept all changes" in order to create a clean slate.)

In addition to preventing a collaborating group from overwriting each others' changes, version control allows a group to revert a document to any past state, and to see who made a particular change. These are the capabilities that make wikipedia possible, an innovation even more radical than open source software, in which anyone is trusted to make changes, not just a core group of committers, with the core team moving to a moderation role that is post-commit rather than pre-commit.

I invited Karl Fogel, one of the core developers of subversion, now the most widely-used open source version control system, to join me at the O'Reilly Radar Executive Briefing on Open Source July 24 in Portland. I wrote:

Seems to me that Subversion (and the whole idea of version control) is a critical part of virtually everything interesting happening these days, from OSS to wikipedia. I'm wondering if you'd be up for an on-stage interview about the state of subversion...

Karl replied that he'd like to talk more about "version control generally than about Subversion specifically," with the following list of topics as a starting point:

  • Why it's important to be able to search the change history for Wikipedia entries (and why it's incredibly cumbersome using today's interface);

  • Why the U.S. Congress needs real version control tools (and why it's our own fault for not providing them);

  • Why good version tracking is important in a world where more and more creativity consists of mixing existing things together.
  • These are really thought-provoking suggestions. I was particularly struck by Karl's suggestion of a version control system for Congress. They say you don't want to see either laws or sausages being made, but I think they are wrong. Imagine how much more transparency and accountability our government would have if it were possible to see what changes were made by whom, who inserted extraneous riders into various bills, and generally to track the influence of various interests by the new visibility into their actual control over the knobs and levers of government!

    I also found Karl's comments about the problems with wikipedia's version control and version control for a remixable world stimulating. I'd love to hear what he has to say about these topics.

    Karl also offered to talk about "insider politics", so to speak:

    feel free to ask about ... the recent Slashdot article pointing to a blog post in which Linus Torvalds dumps on Subversion pretty harshly, and explains why the style of version control he implemented in GIT is better.

    Although the presentation itself was rude almost beyond belief, some of his technical points were valid, and the Subversion developer community has been good about separating the style from the substance.

    There definitely is some substance to Linus' comments. He talked about the importance of support for branching and merging of distinct version lines, the possibility of commits per line of code (rather than per file), and the viability of distributed repositories. These are definitely forward-looking ideas -- and as I noted in my recent entry on the GPL and Software as a Service, we need to be thinking hard about the future of software, and not basing our tools, our licenses, and our development practices on that future.

    I often end my talks with a stirring quote from Ray Kurzweil (which I wrote down at one of his talks at the Foresight conference about five years ago):

    I'm an inventor, and I became interested in long term trends because an invention needs to make sense in the world in which it is finished, not the world in which it was started.

    Returning to the topic of version control and our legal system...despite the many successes of our form of government, it's definitely creaking at the seams. The founders, for all their foresight, didn't plan for a nation of 300 million people, most of whom don't care to vote, they didn't foresee the extent to which the bureaucracy would become a fourth seat of power. I don't have any great prescriptions for politics, but I do have prescriptions for technology. We all need to think hard about how the future will not be like the past, focusing our efforts on that future and being willing to change course when faced with discontinuities that render our past thinking obsolete.

    The future we face is one of massive collaborative systems. How we design those systems, and how we build critical freedoms into them, shaping their architecture to support either participation or centralized control, is one of the great challenges facing the technology community today.

    tags: open source  | comments: 28   | Sphere It

    Previous  |  Next

    0 TrackBacks

    TrackBack URL for this entry:

    Comments: 28

      Swashbuckler [07.13.07 12:36 PM]

    While it would be great for our democracy, Congress doesn't want a version control system. That would introduce accountability into the process of creating legislation that Congress and lobbyists do not want because they could no longer profit from hiding in the shadows of the current system.

      Brian Aker [07.13.07 12:48 PM]


    I saw your blog post come through my RSS reader and clicked on it to read the entire story... though before I even clicked on it I was thinking "I wonder if Tim will catch on how much of a dead end central repository systems are for the problems we are facing".

    In a central system each group would have to check out a version and submit their changes back to the central authority. Each group would then rewrite a section that another group has written. Not very friendly, and it always leaves people wondering "is the next guy just going to overwrite my work?".

    In a distributed system, like Bitkeeper, GIT, BZ, etc... each group can work on their own copies and then merge and trade changesets with other groups. Greater collaboration. The final revision that will be stored in the master repository will have already been read and merged by the group (which means overall less friction). Subgroups of groups can even work on portions of the group's work, and merge ideas before sharing their results with other groups.

    In LAMP, both the L and the M require distributed systems because of the number of developers involved (and my understanding is that one of the P's are moving this way as well).

    The centralized model that we inherited in CVS really does not scale to large collaborative groups.


      Luigi Montanez [07.13.07 01:06 PM]

    @Brian: But Tim (and Karl Fogel) are advocating the idea of version control in general, not the specific systems of CVS or SVN. As stated in the post, Fogel wants to talk about "version control generally than about Subversion specifically".

    Obviously Subversion won't work for Congress for a variety of reasons. I think they're talking about version control in the most general sense, and of course a specific implementation for Congress would need to have a much different design than CVS/SVN.

      FP [07.13.07 01:28 PM]

    A friend of mine at NYU ITP started as his thesis project. Not quite version control like SVN, but something to place laws into common speak for the masses. Probably would be a great addition to the Library of Congress' Thomas website.

      Paul B [07.13.07 02:10 PM]

    Re Congress: Perhaps Mercurial would be a better fit.

    (Sorry, couldn't resist.)

      Paul B [07.13.07 02:12 PM]

    Re Congress: Perhaps Mercurial would be a better fit.

    (Sorry, couldn't resist.)

      Karl Fogel [07.13.07 02:30 PM]

    Luigi already said most of what I would have said. I'd just like to add a few thoughts.

    Actually, Subversion (with some very specialized user interfaces) could be a pretty good version control system for Congress. For one thing, locking (the ultimate centralized feature) can be a good thing sometimes; that's why it's so popular.

    Swashbuckler is right: many in Congress wouldn't want version control. But the point is to make it a cultural norm, by making it a technical norm. It's a lot harder to object to something when to object means to deviate from the expected behavior.

    Right now, most people don't get outraged when they hear that a clause has been inserted into a bill but that no one can say who inserted it. Those of us who use version control systems on a daily basis get outraged, though, because we think: why don't they just check the commit logs? Or better yet (to use Subversion as an example), why not just run 'svn blame' on the bill?

    If that sense of entitlement to the change history were part of everyone's standard set of expectations, like disclosing campaign contributions or not lying on your resumé, Congress would have to go along.

    I think Brian Aker's characterization of the different styles of development that the different tools encourage is a bit exaggerated (at least in my experience; he has more experience as a user of distributed systems, though). The distinction between "centralized" and "distributed" revision control systems is real, but not binary.

    I'm completely in favor of Subversion getting distributed features, by the way. Subversion's design is not inherently centralized. Rather, it's inherently tree-based: Subversion versions trees of files and directories. It could do that in a decentralized way (SVK is an existence proof of this), and my impression is that the Subversion development community pretty much agrees we should move in that direction... to the degree that users need features that can only be supplied by moving in that direction.

      Carl Malamud [07.13.07 02:32 PM]

    To broaden this even further: version control in Congress is about transparency. That also means making video from *every* public hearing available. If all you see are the different versions of the text without seeing the give-and-take in the hearing that resulted in the new language, you are missing the most important part, which is intent.

    It is very common for judges to resolve ambiguous language in laws by going back to the hearings to try and divine intent. For judges to properly divine this intent (and for the rest of us to be able to participate in the democratic process if we don't happen to work inside the beltway), that means that video from every public hearing should be available on the Internet.

    Version control and a complete public video record of the proceedings are both esential elements of transparency.

      Karl Fogel [07.13.07 02:36 PM]

    By the way, regarding this: "... the possibility of commits per line of code (rather than per file)":

    Every version control system in existence today, including CVS, can support this without major changes. It's purely a client-side interface issue. The fact that most interfaces don't bother to support it may indicate that it's perhaps not as important to most users as we like to think. (What I mean by "client-side interface issue" is that it's possible, for example, to implement this for Subversion without changing a line of code in the Subversion core libraries, the libraries on which things like TortoiseSVN are built.)

    What's harder is tracking merging of portions of file as a unit, as in: "What other files has this function been ported to, and has it been changed along the way?" Subversion doesn't have any features for that right now. I think Mercurial doesn't either; don't know about GIT.

      Thomas Lord [07.13.07 06:14 PM]

    Version control is an interesting way to frame the congressional record question but the version control "concept" is itself problematic for that purpose (and many others).

    When Fogel et al. and I had back and forths during the halcyon days of "Arch vs. Subversion" we basically agreed that at the "0th order" revision controls task is to make historic snapshots of datums accessible. The "N+1th order" is to add superstructure concepts like lines of development, branches, and merging. (I'm flagrantly putting words in his mouth so feel free to see if he would disagree, please).

    As a technical implementation matter, if you are optimizing for things like rate of retrieval, rate of storage, and so forth, then you get into debates that could be titled "Store snapshots? or store deltas?" and a spectrum of middle grounds between those. Linus productively embarassed both Subversion and Arch by saying, from our perspectivve, "OMFG, that's just *so* not the question that matters -- brute force wins hands down on those issues. Haven't you idiots been following the price of disk space and bandwidth and correlating them with actual use cases?" Linus has a bit of a talent when it comes to slinging around phrases like "you idiots".

    So, in a more Linus-ian spirit, turning to the question of congressional records: it's the work flow, dummy.

    I don't want my legislators working like Linus does, blindly merging changes so long as they come from trusted leutenants. I just would enjoy, as you suggest, an accurate historic record of a bill. Snapshots matter a lot more, in this domain, than lines of development, branching and merging, and so forth. Given snapshots, and their factual meta-data (data of creation, etc.) all version control system features can be implemented, multiple ways, as convenience dictates. That's pie in the sky, though: the present need is simply for a better paper trail.

    Workflow matters because, from what I can tell, a huge amount of the current process is either ephemerally verbal or using pen and ink on paper. That is to say that the accountable legistlators *tend* not to trade (in any meaningful way) in text files and patches to text files -- they trade in attention, conversation, and by-hand marked-up pieces of paper.

    Unlike Karl, I'm not anxious to interfere with the current workflow. Sure, let's please have GAO make new legislation available in convenient XML form but as for tracking history: we can't pepper that stake with a little technology and get back a historic record of decision making; it will all still come down to ephemeral chat and trade in discarded paper with mark-ups; all we can possibly do by imposing IT there is create new games and gambling opportunities and conversational obstacles for people to get around. We won't get a history of each piece of legisltation: we'll get a history of random bureaucratic procedures that the players are working around.

    Karl, to put more words in his mouth, would probably agree that "archiving significant documents and making them accessible is good, including in cases of congress".

    That wouldn't be version control. It would be a form of photography.


      Michal Migurski [07.13.07 06:21 PM]

    The Sunshine Foundation's Open House Project released a recommendation that touches on legislative versioning, along with a collection of other suggestions for improved on-line availability and archiving of congressional information. Read it!

      Thomas Lord [07.13.07 06:37 PM]

    Transparency is good by trying to impose transparency by slapping on IT handcuffs is INSANE. It will never accomplish what it purports to be aimed at so it will just add entropy to the system.


      Grzegorz Daniluk [07.14.07 01:04 AM]


    In my opinion you touched very important field. Just like printing and press (one way information highway) changed our world, also Internet (two way information highway) will change political systems. We are in fact just at the begging of Internet revolution.

    In my country, Poland, few years ago we had a huge political scandal. It is called "Rywin gate". Some guys tried to change parts of a bill. Of course they wanted a huge bribe for this. In the end parliament investigators commission found these people. But there was a big problem with proving that TWO WORDS which were changing meaning of the bill were added by these people. Then I was thinking "damn, with even simple CVS system for bill tests, finding guilty would be almost automatic".

    In US, Silicon Valley position is much stronger that then tech industry in other countries. Therefor you have a real possibility to change the history.

      Kolin [07.14.07 03:31 AM]

    Tim, how I would wish the system like that for my own country, Ukraine, the country is being deviled by dirty politicians into pieces...

      Pete [07.14.07 09:55 AM]

    We also need an electronic voting system that can authenticate election votes. Think seadragon, with homes/apartment blocs displaying their votes, manual selections of any area can then tally them

      Max [07.14.07 03:40 PM]

    Something like NewsSniffer but on a bigger scale?

      EAS [07.14.07 07:04 PM]

    I've wanted version control for legislation for over a decade.

    Thomas Lord wrote:

    I don't want my legislators working like Linus does, blindly merging changes so long as they come from trusted leutenants.

    This already happens now. Some of the more odious parts of the DMCA, for example.

      Ross Stapleton-Gray [07.14.07 10:01 PM]

    Tim said:
    > they didn't foresee the extent to which the bureaucracy would become a fourth seat of power

    This seems like something of a non sequitur; which bureaucracy? "Government bureaucracy" is generally assumed to be the executive branch agencies... as a former federal "bureaucrat," I think things have worked pretty well, with a certain amount of expertise and responsibility in the various agencies. There's a lot of wrecking going on now, with a rogue Executive bringing in ideological hit men, but it's an unbalanced and out of control Executive Branch, not bureaucracy.

    But back to the main theme, absolutely, more transparency and accountability would be a vast improvement. I'd suggest at least two additional steps to take, beyond suggestions such as Carl has made:

    • First, require that all legislation that isn't governed by classification concerns (which ought to be a very small amount... if it isn't a small amount, we ought to hear why) be distributed electronically to the citizenry in advance of any voting, with some window X (which can depend on the scope, urgency, etc., of the measure... an omnibus budget bill gets a bigger window than a minor amendment)... have it pushed in some acceptable standard format, and then let anyone and everyone have at it. We'd see a million greps bloom, with various parties combing through every bill, scanning for their particular oxen being gored, or fattened. Lots and lots of eyes on the process.
    • Secondly, remove the constraint that legislators need to be present in the chamber for voting. It's the 21st century... how many of us are working virtually now, and get a helluva lot more done for not having to show up in a lot of committee meetings? It would also do a lot for security/continuity of government, making the Capitol a lot less inviting target. Do we really believe that ganging all these folks together physically in the House and Senate chambers is necessary to producing good legislation, knowing how much of what appears in the law is crafted far from Capitol Hill?

      Thomas Lord [07.15.07 10:02 AM]

    Ross's suggestions, immediately above, are very radical:

    Public review periods for proposed legislation would, combined with the natural of electoral politics, bring us a lot closer to legislating through referendum rather than representation.

    Remote congressional voting would (a) reduce the amount of discussion sharing among members of congress; (b) be ripe for tampering.

    That doesn't make either an obviously good or bad idea. I'm just pointing out that it's very, very hard (mostly for good reasons) for reforms that are that radical to get traction.


      Ross Stapleton-Gray [07.15.07 12:25 PM]

    Remote congressional voting would (a) reduce the amount of discussion sharing among members of congress; (b) be ripe for tampering.

    I don't think (a) is the case; there's no requirement that legislators NOT be in D.C., just that they not be hostage to the room. We know that an awful lot of what goes into any bill is staff; freeing legislators up to actually spend more time with their constituents would likely enhance the amount of information from them that legislators would express through their staff's work in concocting legislation.

    And (b) seems awfully unlikely, given that all these votes are a matter of public record... just as anyone can sieve through pending legislation, they can ride herd on how the votes went, and flag anything anomalous. (And are you saying that electronic voting is inherently untrustworthy, even with a small, well-monitored population? Then we ought not to use it in the general elections.)

      Sachin [07.16.07 02:00 AM]

    Ross's suggestions are indeed great. This sounds like the Long Tail of democracy working for you. It has the potential to make the legislations and amendments more understandable for the common man. One potential risk is of legislators' staff getting inundated with comments most of which wouldn't be worth the time. However, as seen in many community websites with the right features in place, an order emerges from the apparent chaos which will place its own filters on what is worthy of going through and what isn't.

      Juan [07.18.07 12:39 PM]

    I have a very simple solution. Remove lobbying from the political process. Then in the same bill take way the rights of of a corporation as an inidividual. When corporations were given the rights of individuals we gave away our voice to them. Lobbying is not a right guaranteed by the constitiution, but it is a the removal of the Rights of the individual to corporate interest. This might be radical ideas but wasn't this what America was founded on?

      Joe Germuska [07.22.07 08:35 PM]

    This is a very interesting discussion. I just wanted to share an item on physical presence (see Ross Stapleton-Gray's post above), if people hadn't heard:

    House to governor: Stay in Springfield --

    "Lawmakers adopted a resolution calling on the governor to "reside in Springfield ready to negotiate" during an overtime session on the budget stalemate. "

    At least here in Illinois, they aren't ready for telecommuting politics.

      david [07.25.07 02:50 PM]

    you know I've been thinking a simple fix for the problem of big money in Washington would be to pass a bill that required any contract written by the federal government to a private company to provide a provision in which the private company agrees on to engage in lobbying or to fund campaigns or PACs for the term of the contract plus some small number of additional years.

      Ed M [08.04.07 07:25 AM]

    So it's the morning of Saturday, Aug 4, 2007 and this morning the US House of Representatives is debating HR3356. Although it may be surprising that the House is working on a Saturday morning there is good reason for this. HR3356 is the bill intended "to amend the Foreign Intelligence Surveillance Act of 1978 to establish a procedure for authorizing certain electronic surveillance". In addition to the serious nature of the bill Congress is about to go into summer recess. So there is an apparent last minute push to get this bill through Congress.

    Now I don't wish to debate the pro's and con's of this bill but I do want to relate it to the topic at hand; a revision control system for Congress. As a somewant good citizen, I wanted to read and understand what this bill actually says or does. Now there is currently a fair amount of press coverage but this bill, for me at least, is just adding to confusion and opinionated viewpoints that come with a national debate. So I wanted to now what exactly is going to become law.

    Now if you do a search on HR3356 you will see this

    The text of H.R.3356 has not yet been received from GPO
    So if I wished to take part in our democratic process and be a good citizen I would have to wait till after the fact, after the vote on this bill. This is WRONG!

    I am going to assume the US Congress is not using typewriters and carbon paper to create the 435+ copies of this bill which each represenative needs to read before she/he votes on such an important bill. I am thus going to believe that they either have paper printouts of an electronic copy or an electronic copy. Why can't I have access to that exact same electronic copy of the bill NOW!

      Ross Stapleton-Gray [08.15.07 04:44 PM]

    Here's something today from Barack Obama, who, at least at this point, is getting my vote:
    "To make the government more accountable, Obama said he would post all non-emergency bills online for five days before he signed them into law, allowing Americans a chance to weigh in on the legislation. In addition, he said he would post all meetings between lobbyists and government agencies online."

      Nita [04.03.08 12:20 PM]

    What techniques are used effectively in Congress, President, and Judicial brach, to control the bureaucracy?

      Rich Morin [12.10.08 11:26 PM]

    I just posted the following question to "Will you support the application of version control technology (as used in software engineering, wikis, etc) to drafts of legislation and regulations, providing transparency to their creation process, authorship, etc?"

    Post A Comment:

     (please be patient, comments may take awhile to post)

    Type the characters you see in the picture above.