Put change.gov Under Revision Control!

Last week, the New York Times wrote about Changes at change.gov:

The policy section of the transition site was removed without notice just days after Change.gov went live shortly after the election. At the time a spokesman for the Obama-Biden transition effort said they were “re-tooling” it.

There was an almost instantaneous outcry from bloggers and other advocates of transparency in government who noticed disappearance. At least one site posted a complete archive of the old Agenda pages. (Increasing transparency, by the way, is a key feature of Mr. Obama’s government reform agenda, according to the site’s “Ethics” page.)

The changes, as it turned out, were mostly to tone down the partisan politics of the policy documents as published during the campaign. But the lesson remains: when public documents can be changed without notice, it’s essential for the public to be able to see what changed, and why.

There’s a profound and simple tool that the Obama administration can use to improve government transparency. It’s something that’s enabled worldwide collaboration among software developers, and whose relevance for content development has been definitively demonstrated by wikipedia: Revision control. Not only does revision control allow a community to work independently on a common project, it makes it possible to review the changes.

There’s a primitive form of revision control in word processing products like Microsoft Word, but we need more than that, especially for documents that bring together the work of multiple independent authors. For change.gov, the wikipedia model might work: logging of every change, with only authorized participants allowed to make changes, but everyone (the public) able to review and comment on associated discussion pages.

The real holy grail, of course, would be to provide revision control on all government regulations, and eventually, on legislation. This would no doubt be fought tooth and nail by lobbyists who don’t want their fingerprints on the final result, but that’s precisely why it would be such a breakthrough. And that’s also why I suggest that the Obama team start with change.gov: demonstrate that the system works, that it has enormous benefits in transparency, and work from there.

Of course, there are major technical and workflow obstacles. Many of the documents in question are probably worked on independently as Microsoft Office files, with bulk merges that obscure the history. What we really need are distributed revision control tools. There’s a lot of good work happening in this area in the software development community; it would be fascinating to see it extended to collaborative document development. (Of course, shared editing a la Google Docs is coming to Microsoft Office as well, so perhaps the point of leverage is for Google to improve the revision control capabilities in Google Docs, starting an arms race with Microsoft. Once the tools are in place, the social pressure to use them has a point of leverage.)

Like so many things that go under the rubric of “change,” I’m sure that there would be many complications to this proposal, many problems I haven’t thought of. Many current assumptions and processes would need to be challenged, and some of the challenges would take us down dead ends. But that’s what change is all about. If it was just like the present, it wouldn’t be change.

I’d love your thoughts both on the general proposal and specific ideas for implementation.

P.S. I wrote on this same subject about a year and a half ago, in a post entitled Why Congress Needs a Version Control System. It was Karl Fogel who first put this idea into my head, and that post explains his thinking. There are also some comments there from 2007 that may provide more grist for the discussion.

tags: ,
  • Robert Banghart

    I think this is a great idea and that change.gov appears to be open to ideas that will aid transparency.

    That leaves the question of how to get the word to change.gov. Why not give them a link to your post or just cut and paste it into http://change.gov/page/content/contact/?

  • Tim,

    You do not need the people who run “change.gov” to change a thing. Here is a very simple technical plan for achieving almost all of your aims using the existing infrastructure. This plan would require a tiny amount of money up-front to build some very simple new code. It would require on-going hosting costs although those could be quite low and it should not be hard to make hosting pay for itself. Best of all, this would be a large step towards distributed revision control.


    1) Build a spider that frequently crawls “change.gov” and caches everything.

    2) Devise a naming system for those documents (e.g., their relative url under http://change.gov plus a timestamp).

    3) Build an authority list that lists the URIs of all the pages snapshotted plus a crypto-quality checksum.

    4) Create a “change alert” Atom or RSS or whatever you like feed that has a time-line of when URL contents change at “change.gov”.

    5) Build a web site serving up the snapshots and the feed.

    6) Suggest that people (a) mirror your efforts so there are cross-checkable, independently gathered records; (b) build data-mining and visualization tools for the resulting database.

    Build that and you’re most of the way there.

    Then you can build and propose flyweight new tools and procedural changes to “change.gov” — e.g., give them a way to attach meta-data to a page udpate where they might store, if they so choose, the practical equivalent of a “commit log entry”.

    That’d work faster, cheaper, and better. I’d suggest being very careful and thoughtful about designing the naming policy for the snapshots and the checksum mechanism. If it gets that far, be very careful proposing new meta-data formats (e.g., for “commit log” foo). But, getting started is cheap and easy and should be effective to the extent people find the resulting data interesting.

    And, gee, it could be generalized so that (a) you can point the same set of tools at any site; (b) layered tools (data mining, visualization) would work equally well for whatever site you are tracking.

    (The “other part” of distributed revision control is how to handle “merging” and “history” in the face of multiple “branches” — but that stuff can be layered very cleanly over what I’ve just described.)


    p.s.: If you “get me” here, then you are starting to “get” GNU Arch.

  • Dave Land

    Excellent suggestion, Tim.

    It should be law, and the sponsors of the legislation should lead the way by placing the law, as it develops, under revision control. The attention that an “open-source policy” could garner may make it more difficult for lawmakers to justify their opposition.

  • It occurred to me some time ago that our legislators should use revision control & author logging.

    When you consider how ridiculous it is that things can be “slipped into” a bill, then approved as law; it is clear the system is dysfunctional. “How did that earmark get in there?” Give me a break!

    Another worrisome habit that may be prevented with a technological improvement is that of Senate/House staffers doing the actual penning of changes to a bill.

    Then again, one way to accomplish some of these improvements AND de-bloat legislation would be to require lawmakers to hand-write each version of a bill. Impractical as it may be, that would likely do more good for the process of creating law as anything else.

    I’m not convinced that the current level of complexity in bills these days is necessary or useful; or that our members of Congress actually read or write any of these laws.

  • Mike

    The question isn’t exactly who wrote or physically inserted a piece of text into proposed legislation.

    There are two questions, I think:

    The first, purely practical question is how to provide a mechanism that makes it easy to view “just the changes” between two drafts and to associate those with any meta-data that may be available regarding them. For example, a revised bill comes to the floor at midnight — how can my representative quickly derive the exact changes from the last version they or their staff reviewed? (Ditto for the constituents, of course.)

    The second, deeper, is not who wrote or physically put each change but what official takes responsibility for it?

    You need more than revision control for the second question and it is generally a mistake to interpret revision control logs as if they answered the second question. The only way to handle that second question is with a “source code management policy”. For example, the bodies of the legislature could have new rules that every change between two drafts must be “signed” by a member of the legislature. Such rules might well run afoul of the first amendment but that’s what you’d need to accomplish what you are asking for.


  • There is some amount of versioning on THOMAS where different versions of legislation in the making are stored (back and forth between House and Senate). However, the Library of Congress doesn’t understand that URIs are supposed to be permanent (quite ironic for what’s supposed to be THE reference library). Most of their URLs expire within minutes, making the site unusable and almost impossible to deep link to, let alone build features such as differential highlighting on top of it.

  • bowerbird

    thomas lord has it right. build this system external to change.gov,
    and prove that it works. otherwise, it’s just another one of those
    government boondoogles that costs zillions of dollars and ends up
    not working very well, and sometimes not even working _at_all_…

    it has to be simple enough that the average person understands it.

    no revision control system i’ve seen thus far even comes close…


  • Tom –

    Great idea. I wonder if we could get Sunlight Foundation to fund it. How much do you think it would cost you to do it?

  • I love the idea of more transparency, and revision control is a topic dear to my heart.

    Archiving the versions of a website is straight forward and relatively inexpensive. How much functionality and indexing you want to expose to interact with the archive is where things get more interesting (and cost more to build/maintain).

    And while change.gov is a nice place to start, I’m more interested in the impact on the legislative process. The ideas in the post remind me of something that I originally thought of as an evolution of collaborative software development, but now I believe could be applied
    to any similar collaborative processes (thanks to Tim).


    ‘Code’ and ‘Law’ share similar properties, almost by definition. While some of the metrics I discuss in passing probably don’t make sense for law, I think the idea of a minimap of the document and real time indication of who is changing something would not only provide for greater transparency, but could also improve the efficiency of the process. Lawmakers could see what other people are working on and more quickly respond in favor or against, and if you really want to get crazy, open up the process as read only to everyone.

    Imagine an iPhone mashup that let’s you mark up pieces of legislation you care about and get alerts if it changes…

    I don’t expect to see anything like this soon. I envision resistance from a group who would not be comfortable with this for many reasons, the least of which is probably the technical barriers.

    But maybe, just maybe, by the time today’s freshmen senators are the ones with the longest tenure…

  • ‘Versionista’ did roughly the Tom Lord suggestion (regular external sampling) during the campaigns:


    But, while such sampling is doable without administration support, it will always be second-best to the ideal arrangement, where the official source uses an open-to-the-public revision control system.

  • Tim, regarding your question:

    Great idea. I wonder if we could get Sunlight
    Foundation to fund it. How much do you think it would cost you to do

    I have trouble putting a price on it because all we have here so
    far is a sketch of technological underpinnings of the idea. Certain
    kinds of informality are a boon, not a burden, but more formality is
    needed to set a price. Call it a “scope of work” or something.

    Towards that end and trying to cut through bs quickly and
    efficiently, here are some overarching thoughts:

    1. Success or failure depends only in part on the software.
    Additionally, a constituency must be built: the implementors of the
    software, the users, the builders of layered tools, change.gov
    folks and other content providers, etc.

    2. A constituency (worth aiming for) is unified at the very least
    by a common understanding of a situation and by the means to
    communicate efficiently among themselves about that common
    understanding. That is, constituencies are built out of shared
    ideas and a common language for talking about those ideas.

    3. Commercially relevant continuencies almost invariably form
    around a specific set of public documents: statements, standards,
    definitions, narratives, etc. In particular, the shared ideas and
    the shared language of a constituency (see (2), above) are almost
    invariably first apparent as a set of public documents which convey
    those shared ideas. The main reason such documents are almost
    always at the core of things is because the documents simplify entry
    into the constituency (“Here, read this.”)

    I would suggest, Tim, work-products / plan-of-work (aka milestones)
    along the lines of:

    1a) You cut and paste and edit from your two blog posts on the topic
    and create a “motivation” document.

    1b) In parallel (let’s exchange a couple of drafts) I’ll write an
    architectural “master plan” elaborating the sketch I gave in the
    earlier comment. The “master plan” will give a clear overview of
    exactly what software is to be implemented but, equally importantly
    from the constituency-building persepctive, will give an overview of
    how each component is envisioned to be implemented plus a rationale
    as to design and envisioned implementation. For example, I
    mentioned the need to come up with a systematic naming system for
    snapshots of change.gov and I sonmewhat cryptically said care was
    needed in the design of the namespace but I didn’t elaborate. The
    master plan would include a draft syntax for the namespace, an
    explanation of how to map names to URIs/URLs, and an overview of the
    rationale for the particular choices in namespace design. The
    resulting “master plan” will contain many straw-man aspects but
    should be sufficiently strong to justify continuing work.

    2) We each solicit some initial feedback on the documents from (1a)
    and (1b) at which point we can repeat step 1, or move on to step 3,
    or “drop dead” on this particular effort.

    3) I puzzle over the resulting “master plan” (ideally with ad hoc
    ability to ask questions of your IT staff) and come up with a work
    schedule and a set of requisitions. The schedule is a calendar,
    the reqs are a mix of coding tasks, integration tasks, testing
    tasks, admin tasks, etc.

    4) We again review and iterate, continute, or drop dead.

    5) We together write a work order which assigns the various items
    in the work schedule to specific parties. We can draw on everything
    from me, to your IT staff, to Craigslist bidders.

    6) We again review and iterate, continute, or drop dead.

    7) By now documents are hopefully receiving wider circulation and
    work is underway on the initial software. We collaborate on a
    document about presenting the results of this R&D to the larger
    free software community, to the change.gov folks, etc.

    We can talk about my fees via email or by phone if you wish but I
    can at least say you needn’t fear sticker-shock.

    Just to be clear: I’m not your goto guy if you need some
    quick coding but not much else. Between your IT staff (or
    Sunlight’s or whatever) and craigslist, if we need a quick 1-5k
    lines of code to make simple web magic happen, well, I’m perfectly
    competent to do it but all those other folks can do it faster and
    with less stress. My low-level coding passion lies elsewhere. On
    stuff like this, please let me concentrate on the cohesiveness and
    potency and technological sanity of the overall effort, rather than
    on getting the details of a particular piece of wget scripting


  • I think a basic technical framework for the problem would take about 15 minutes to produce:

    • Spider the site with wget
    • Save the changes into a version control system (e.g. git)
    • Publish (e.g. with github

    Repeat as needed.

  • This is totally a “me too” comment, but: great idea. And not just for change.gov, but for all accessible government data. Such a system, if it had an interface good enough for non-techie journalists and others to use, would do a lot to raise the “expectation bar” for government transparency. Once people get used to a versioned view of some of the data, the onus would start to be on the data producers to explain why they’re not making *all* data available in a way that permits third-party versioning.

  • http://lotv.blip.tv/#645082

    Step one to Transparent Federal Budget…

    But legislation into a CVS that has Paragraph level permalinks for the we can crowdsource the documentation on legislation ourselves.

    Let s find out when diff occurs so that we can update the documentation WE the citizens have created.

    Thanks for this post Tim!

  • Yup. Deducing the paragraph-level changes, etc, is something third-parties can do as well. The core functionality we need to supply first is just to notice the changes (the updates), save the old and new versions, and give every version a stable label. Obviously, more can be done after that, but supplying the base infrastructure removes the bottleneck — once you have that, then anyone can start layering stuff on top of it, just like a maps mashup.

    By the way, “CVS” is just a particular version control system. It’s probably better to use the generic terms “version control” or “versioning”. Think of it as saying “Ford” to mean “car” or something.

    (Also, this system almost certainly not use CVS anyway, for various technical reasons.)

  • Hadley, you wrote:

    I think a basic technical framework for the problem would take about 15 minutes to produce:

    • Spider the site with wget
    • Save the changes into a version control system (e.g. git)
    • Publish (e.g. with github)

    I agree, mostly. That’s why I was suggesting to Tim that (a) it’d be inefficient for me to whip something like that up; (b) once the system is designed it is probably easiest for whatever IT staff will babysit it to assemble it.

    And, certainly, if you have some spare cycles and hosting yourself, and like the political cause, set up what you suggest and “let’s see” what the result is. Best case is a collective realization that that’s all we really need. Worst case, we learn something.

    But, I think there’s more work to achieving “success” here than just that. There are some elements I see missing from that first cut at a plan:

    1. I’m a bit of a fetishist (for principled reasons) about wanting to make sure that the data set gathered has a canonical format which is independent of any one revision control tool (such as git) and that this format be of “archival quality”. By “archival quality” I mean that the data set has to be verifiable and well organized with library-class naming system for the digital resources. I’m imagining, for example, a university library that wants to use tools like we’re talking about in order to add to its locally kept archives of digital resources — the naming system for the data set should be suitable for use in a catalog and there should be extra “bibliographic” meta-data.
    2. A low-level, RESTful API is desirable and that API should closely reflect the inherent structure of the data.
    3. Most importantly, and especially if we imagine applying the tool to more than one site / set of data, we really need those “constituency building” documents. Those can help stake holders from all angles share a common grokking of what the tool is and what is its intended use.
    4. There is the pesky problem of computing and displaying meaningful diffs for fairly unconstrained HTML. We can’t do this perfectly and there will (one hopes) be multiple tools for this but this area needs some thought up-front both to preserve options and pick some good first targets.
    5. In the medium term I think we want to promote the idea that sites (and content developers) can “tie in” to this kind of archival system in an active rather than a passive role by adding meta-data to their pages. Think “robots.txt” and “favicon” on steroids: can it be made easy for sites to designate a set of page updates as an “atomic” update that effects multiple pages at once? Can a publishing site be the first to compute a definitive checksum? Can a publishing site include “commit comments” and sign off on changes, etc? Can it do all of those things within and with little disturbance to existing content management approaches? I think this is the biggest area of promise for the idea — that it initiates a dialectic between data archivers / miners and data publishers and that dialectic can cause both sides to tunnel from opposite sides of the mountain and complete, in essence, a global distributed, decentralized, transactional file system (and won’t that be fun :-).
    6. I don’t imagine a sponsoring agency taking on all work on the system now and in perpetuity. It seems intrinsic to the idea that if the data is really as interesting as we think that third parties should be encouraged to build open source tools for mining and visualizing it. This is a second reason to have the constinuency-building documents and is an aspect of the project that effects all other aspects (like the naming system design, the API, etc.).

    It’s all of that other “squishy” stuff that I think makes it harder than just setting up a git browser or subversion site and running wget in a cron script.


  • Karl,

    Would you like to try collaborating on a more formal proposal?


  • bowerbird

    i like the 15-minute approach very much as a starting point…
    and i hope _simplicity_ will be a high-priority goal throughout.

    there’s something else that needs to be considered, though,
    right at the outset, and not just when it becomes apparent…

    in many cases where revision control is used, there’s unity on
    the need for everyone to understand exactly what’s going on.

    when legislation is being made, however, there are factions
    who dearly want to hide and disguise all their machinations.
    (recall the famous saying comparing it to sausage-making.)

    these factions will try to game your system, even thwart it…
    they will try to overwhelm, and to complicate and obfuscate,
    and persist in the attempts because they have much to gain.

    so you must actively incorporate plans to foil their efforts,
    with mechanisms that will continue guarding against them.


  • I have been talking with so many NPOs, governmental entities, and government employees on this very topic for the past 4 yrs because I consider it the CORE building block of doing the TransparentFederalBudget.com

    Many of them want this but need a way to sell it to their bosses.

    If we create something that is more of a standard that they can understand and “sell.” We can get them on board. We can do a petition and create a movement. That is what gets things done politically – we need to use it more often!

    Tim, if people like you and Tom Lord and Karl Fogel and everyone else ( you know you who you are because I have talked to you about this before!) – created a document… You would give me something powerful to sell to them. I will be in SF Dec 3-15 lets meet and talk? A dinner with all of us there?

    Instead of just being geeks – arguing over the best tech. (I say this because I am the worst perpetrator.) Let’s give them something simple they can do…

    Cause I can sell it. I know top people that are hungry for it. They just need a clear path and clear support.

    thank you!

  • Karl, Tom, etc –

    Please let me know if there is anything I can do to help out – I think this is an excellent idea and have both been involved with government transparency advocacy in the past and I currently work at GitHub.


  • This is pretty close to archive.org’s core offering. Might they be persuaded to put sites of peculiar public interest under closer watch, so they’re less likely to miss an update?

  • @trek-geeks,

    This happening of us all (virtually) standing around and realizing that history is rolling past and then evaporating on change.gov when we could be recording it reminds me of Spock standing in front of the Guardian of Forever as it fast forwards through Earth history, inactive tricorder in his hands, saying “I am a fool. My tricorder is capable of recording events at this speed…”


  • Why not use Versionista?

  • Jesse,

    People should stay away from versionista.

    Look no further than their end-user license agreement which contains a non-compete clause. That is to say: Anyone who has had a versionista within the past year must not work on the project we’ve been discussing.

    Aside from that problem, it’s a proprietary program, it hoards the data it collects, it contains no good mechanism for fighting repudiation, there’s no indication it has a good API or is otherwise extensible to others, it is said to be rather expensive, and none of us are free to fix bugs in it or add features.

    Other than that, I don’t see any problems with it. :-)


  • Well, actually I think Tom’s articulated the idea pretty succinctly — enough for any technically-oriented interested party to fill in the rest and Make It Happen. Whether that’s Sunlight or someone else remains to be seen.

    I’d love to take a few days and whip something up right now, but unfortunately am booked solid for the next couple of days. I’ll see if I can at least help with a doc after that… But as we all know, running code would speak louder than any document could.

    Anyone here got time to just write a prototype crawler? I can offer server space, if you need that (though probably that’s the easiest part of the problem to solve).

  • Karl,

    But as we all know, running code would speak louder than any document could.

    Just remember that I called your shot in the earlier post about this: the outcome most likely is one off the shelves of the alpha-influencers.

    Design matters, if you ask me. It’s the inter-op standards we succeed at or fail at promoting that are the potential lasting contribution here.

    The “code” that ought to be doing the “walking” here would comprise something like 3 different, separately hosted, separately developed, economically separate, inter-operable services that are unified around some proposed standards. That is, the “walking code” here should be proving the success of the documents — giving people a reason to read them.

    Tim was waxing on Twitter and mumbled something about the unix “do one thing well” philosophy of tools. It’s really “do one thing well” and “have a simple means of composing tools freely”.

    Looking back at unix history, one of the disciplines the early guys had to apply that philosophy was documentation. The best way to make sure your tool “does one thing well” is to first make sure you can clearly articulate, in detail, what that “one thing” is, and why.


  • I have a question for all the folks suggesting we just wget the documents regularly and put them into our own external version control systems: if this is so great, why do you use version control yourselves rather than just letting someone else do this to/for you?

    Clearly there are benefits for those using the version control. Tools change behavior.

    An external system might help us track changes. But it wouldn’t create a culture of transparency. More likely would just encourage people who hadn’t bought in to try to block such efforts.

    Still, as a demonstration project, with the goal of getting the actual developers of policy documents/regulations/legislation to adopt it for themselves, I’d support it. But don’t mistake tracking someone else’s behavior for helping them to change it.

  • So I saw that you needed someone to step up to this and I have taken my first pass it. I realize you probably want a few more features than I have now but in the interest of getting something up:


    Implements snapshots, very basic diffs (through git) and md5 checksumming.

    Also I would love to get access to some hosting/financing for this but for now I am willing to shoulder it.

  • Oren

    Obviously the goal would eventually to display the diffs in a meaningful way so that normal users could easily see the changes and comment on them.

    I find this idea promising and would like to help make it a reality.

  • A thought: versionista.com at least sets a baseline, minimum standard for useability and user-friendliness here. Those are the real challenges, more so than the mere recording of the data.

    Imagine a system that a Senator of the opposing party would actually *want* to use to track change.gov. That’s the intended audience, in a way.

  • @Tim, you asked:

    I have a question for all the folks suggesting we just wget the documents regularly and put them into our own external version control systems: if this is so great, why do you use version control yourselves rather than just letting someone else do this to/for you?

    With a spider/archiver/change-notifier system the people at (for example) change.gov would not initially enjoy all of the benefits of a conventional revision control system.

    Examples: they would not be able to explicitly “commit” a group of changes as one big change. They would not have tools for branching and merging. They would not have a reliable way to ensure that a specific version of their tree got archived — it would depend on the timing of the spider.

    People who use conventional revctl systems generally want and use those extra features. The features fit in to a specific workflow.

    With the spider/archiver/change-notifier system we can get many of the benefits of revctl without changing the workflow at places like change.gov. We can get change notifications, snapshots of history, and visualizations of changes made between versions. That’s “initially”.

    The significance of saying “initially” is that the spider/archiver/notifier system can grow into a full revision control system. To take full advantage of the additional features people at places like change.gov would have to change their workflow (e.g., to explicitly make “commits”). They would have the opportunity to use branching and merging, etc., if they change their workflow. If they did those things, they would themselves be users of a “conventional revision control system”.

    The spider/archiver/notifier approach differs (in the long run) from contemporary revision control systems in at least two big ways: The s./a./n. system does not require workflow changes to be useful. The s./a./n. system (done right) manifests as a pure, web API managing nothing but content from the web. In contrast: Contemporary revctl systems require workflow changes if they are to be used at all. Contemporary systems are concerned with managing content on file systems, not content from the web.

    Won’t it be nice, though, when the dichotomy disappears? Programming, for example: think of a wiki-like source code editor with some extra features so that people can do branching, merging, complex commits, etc. And picture a day when GCC is able to directly read source files from such a wiki (via an HTTP API) and write back (the same way) its output, whence perhaps that output is dynamically loaded and run by an “Apache appliance” or some such. At that point, local operating systems (GNU/Linux, Windows, etc) will be buried — the “web operating system” will be a complete, self-hosting environment. The web will be its own development platform, just as GNU/Linux is the development platform for GNU/Linux.

    The spider/archiver/notifier system (done right) is an interesting step in that direction. It’s a cornerstone of the “file system” for a web operating system.

    An external system might help us track changes. But it wouldn’t create a culture of transparency. More likely would just encourage people who hadn’t bought in to try to block such efforts.

    Think of George Orwell’s 1984 — Winston Smith at his day job towards the beginning of the book, busy “correcting” history.

    s./a./n. is not optional.

    Still, as a demonstration project, with the goal of getting the actual developers of policy documents/regulations/legislation to adopt it for themselves, I’d support it. But don’t mistake tracking someone else’s behavior for helping them to change it.

    “Nothing in the world
    is as soft and yielding as water.
    Yet for dissolving the hard and inflexible,
    nothing can surpass it.

    “The soft overcomes the hard;
    the gentle overcomes the rigid.
    Everyone knows this is true,
    but few can put it into practice.”


  • @Karl, you wrote:

    A thought: versionista.com at least sets a baseline, minimum standard for useability and user-friendliness here. Those are the real challenges, more so than the mere recording of the data.

    That is all the more reason to nail “the mere recording of the data” properly and give it a proper web API, for then three benefits befall us:

    1) More than one project can work on a user interface at once, independently of one another, without any need for them to duplicate work on the low level code and semantic model.

    2) Users can combine separately developed interfaces over a single data set.

    3) The data set to which an interface is wanted will be defined with crystal clarity rather than being the ad hoc whatever a UI implementor decides to implement themselves.

    The UI problem is so much the “real challenge” that, at this stage, the most useful thing we can do towards that challenge is to mostly decline to take it on directly and to work instead on simplifying the problem.


  • jeff

    I would like to see us treat legislation more as project requirements. As I am sure everyone reading this site knows very well, we need to understand what a given project is attempting to accomplish, any high level needs, how the resulting requirements map to the needs, and supporting information.

    A rough list of what I would like to see in a structure around legislation:

    * The stated goal (“spirit”) of the law.

    * Use Cases – Demonstrate how the law should impact certain people/situations, including negative use cases. Many laws seem to written without thinking about the law of unintended consequences (now or in the future) and without an understanding of the holistic impact of the bill (including interactions with other legislation).

    * Version Control – As others have mentioned, we need to know who changed it and why. I would like the explanation to detail how the change works toward the spirit of the law. Hopefully it would be easy to flag the useless additions and have them removed.

    * Also as mentioned earlier in the posted, discussion that can be captured around each item in the law.

    * Public interaction. Allow the public to read, add use cases and participate in the discussions. We would need some level of control, as Tim mentioned, but it would need to remain as open as possible and still remain productive.

  • johan


  • Well put. The public has the right to know when and why changes occur in the president elect’s policy. Only a programmer’s mind could produce such a simple solution.

  • Benjamin Abbitt

    This is a spectacular idea, glad to see version control concepts be brought out of the realm of programming and into other fields.

    An external solution is a solid idea (wayback machine-esque), but the real trick would be to make a solution transparent enough that legislators (or their staff) use it, claim ownership of their changes/work, and allow the citizens to view and review what their elected officials are doing.

    Great idea, and wonderful job bringing that idea to the forefront of (at least some of our) minds.

  • David Lang

    Hmm, I hope the author/some readers have shared this via the change.gov idea submission system…


  • @David –

    I did make this suggestion on change.gov, and posted a link to this blog post. Will see if it turns up. Thanks for the reminder. It’s good for all of us to remember not just to have these discussions in our own spaces, but to take them to “official channels” to make sure that they have at least the opportunity to be heard.

  • Dave

    Excellent writeup. Many of the revision control systems can easily bring in public opinion too, (as in, imagine controlled legislation, with a commit/main branch for the revisions/changes that actually make their way into government?). Then imagine adding more branches for public input on legislation?

    I’m a programmer, and I think that your idea is a wondrous idea.

    Having such a thing on government pages would really make things better (i.e. think about what that would do to the whitehouse’s website!)

  • ___ Legislation should be functionally open source

    I am wondering about the workflow of a legislator in Congress. We first need to understand how law is created from a document viewpoint.

    How do legislators *actually* collaborate on a particular bill (electronically and on paper)? Is there a centralized repository which keeps a document’s revision history (like subversion)? Or is there a distributive system (like bazaar) to which sub-groups can submit their “patches” collectively (now, where are these repositories in practice)?

    It would be wonderful if a branch of a bill could be checked out and modified by a citizen, then re-introduced into the tree by a legislator’s endorsement. This relates to the question of who has authority to commit certain versions.

    Since the law is public, all citizens should in theory have at least read-level access to the main trunk of development. Public participation would give legislators critical feedback on their work.

    ___ Case history :: tag it under “arrow”

    How did the two-page letter from Henry Paulson transform into an American commitment for a $700 billion bailout? The process went on a fast track, and I do not think the public had full access to the drafts and interim proposals by the legislators involved (including the Presidential candidates).

    The United States was in a national panic, and somehow somebody managed to sneak in an exemption from a 39-cent excise tax for children’s wooden practice arrows. How did this happen, and who was responsible? That sharp detail probably worked out fine for the arrow manufacturer, but for the bigger picture, it would be instructive to see the trail of ambiguous language which has allowed Paulson to spend the funds contrary to his public declarations leading up to Congressional ratification.

    One could easily track this if an open version control system was mandated for the legislation of public law. It could bring efficiency into a legislator’s workflow. In any case, we ought to have (non-proprietary) transparency in government and a right to (speedy) discovery.

  • All of the above are excellent suggestions. But I question the validity of future transparency when the current status of change.gov is anything but transparent.

    I’ve written about this several times; the most recent iteration is at the HTML Times: http://htmltimes.com/cluen-privacy-obama-transition.php

  • I think the people who are saying that the spidering approach is only a first (good) step are correct. While it can be very successful, it’s not enough. As Tim said, you have to create cultural change within.

    The simple reason is that the spider could only pull documents that are accessible. The change.gov people have to make them accessible. If they’re not on board with having the revision history of their docs (or a particular doc) made public by the spidering retrieval/storage system, they’re quite likely to do more editing offline, put docs up less frequently (if at all), strip author information from them, etc. Transparency remains optional.

    To really work, the change has to happen on the inside. A spidering solution could help to demonstrate why this is something the change.gov people might want (especially if they lose their files in a hard disk crash… not that they’d likely then make that public).

    A solution that put raw docs into an Amazon S3 bucket with a simple naming system (based on retrieval date+URL) would be easy to whip up.


  • I’m glad this conversation is taking place, and that so many of you have the skills to offer concrete solutions that may lead to the best solution at some point (if there’s a will on the part of Obama and the change.gov team). I can’t contribute technically, but I can express my gratitude to you all. Thanks!

  • Can’t contribute technically, but I can express my gratitude to you all. This is an important discussion.

    By the way, has anyone gotten any sense from Obama’s Change.gov team on whether they’re interested in “revision control?”

  • Adriano,

    Procedurally, the initial writing of bills is neither a constitutional matter nor a matter of the rules of either house. Bills can come from all kinds of different work-flows. A lobbyist group might do the bulk of the work, or the staffs of various legislators, etc. Sometimes you hear about a legislator who “personally penned” some piece of legislation and usually the report includes mention of how unusual that is. Sometimes you’ll see scanned images of a draft bill with hand-written mark-ups by a legislator or some other stake-holder.

    Once introduced in the usual way and assigned to a committee, the introduction of changes is more formalized: each change has a committee vote associated with it, for example. At this stage, if it happens, you have in the Congressional Record a series of documents that record the details of each change (but still don’t tell who you physically wrote each change and how they came to do so — that’s still “off stage”).

    My impression, admittedly subjective but evidence-based at least, is that for a normal bill you should think of chaos that resembles the drafting of a complex, multi-stake-holder document within the executive of a large firm. Picture some office (staffers for one or more sponsors) with a draft on their hard drive. As the draft evolves it gets shown around and the technicians start vote counting and horse trading. Email, phone calls, hand-offs of printed out text, etc. all mixed with conversations like “we can get the votes of X and Y if we add/delete/revise such and such a provision”. The ghost-writers of the actual bill take as input a big jumble of such inputs and produce as output a kind of soup. It’s all squishy and human.

    Exceptions happen. In an emergency (so deemed by the legislative bodies) the committee process can be bypassed even to the point where votes are called for before most members have a chance to read an analyze the text (e.g., so-called “martial law” in the house). Or, for a really historic piece of legislation, perhaps a legislator will pen a first draft, get some help from his friends who are constitutional scholars, and there you go. Heck, in theory, you and I would draft a bill and persuade our legislators to sponsor it.

    Point is: (a) it’s all talk until its introduced and given a number; (b) it’s often fast-paced, horse-trading, chaotic “soup making” before that. And it’s quite opaque, coming out at most, most often, in somebody’s memoirs years later.

    Personally, I suspect that’s good for democracy. The chaotic privacy of how the draft is formed preserves a lot of freedom of expression and association among legislators and also spares courts from having to examine histories of soap operas when contemplating legislative intent. Basically, if you tried to shine too much sunlight on the drafting process you would paralyze it and as well you’d make jurisprudence a lot more difficult and less predictable.

    For those reasons, change-tracking and history archiving should focus on “official actions” or “official statements” or “official documents”. If change.gov puts an “agenda” page up for 24 hours (as opposed to 5min, followed by an “oops”) then, that’s pretty official. And if two weeks later the policy proposal expressed in the agenda item is completely reversed — well, that’s very interesting and it matters and it is public record. So, by all means, track that.

    But conversely, if your aim to apply version control implies, for example, an explicit, public-record delineation of who and who does not have “commit rights” to a draft piece of legislation then I think you will only complicate and make worse the kind of wheeling dealing, back-room, disguising-of-motives stuff that is intrinsic to the process.

    The United States was in a national panic, and somehow somebody managed to sneak in an exemption from a 39-cent excise tax for children’s wooden practice arrows. How did this happen, and who was responsible? [….]

    One could easily track this if an open version control system was mandated for the legislation of public law. It could bring efficiency into a legislator’s workflow. In any case, we ought to have (non-proprietary) transparency in government and a right to (speedy) discovery.

    On the contrary, it would kill efficiency in a legislator’s workflow most likely as follows:

    The revision control records you propose would be public records. They would create new, broad-sweeping liabilities for the individuals involved and for the federal government.

    The bureaucrats would respond — quite rightly, in my view — by evading the process entirely. That is, they would continue to draft bills just as they do now but taking considerable pains to do it all “off-line” and “off-record”. They would then take the extra effort to “make up a just-so story” about how to introduce each change into the public record system.

    The net result would be law-drafting about the same as you have it now, but with an expensive-to-maintain, public-record lie built on top of it.

    We have, built into the constitution and to the rules of the House and Senate a very long standing, perfectly useful mechanism for that form of lying: The act of officially introducing a bill attaches responsibility to its sponsors. The act of modifying a bill in committee attaches responsibility to the committee (according to the votes). In both cases, there are minutes from the floor or from the committee meeting that are the official record of intent. Period. It’s hard to improve on and it strikes me as naive to think that big improvements can be made by strong-arming staffers into adopting Git, or Subversion, or GNU Arch for that matter. As I said, even if we could force that, it would only paralyze them and cause a deeper obfuscation.

    I really like, Adriano, that you asked that question. Our alpha-influencers here seem to think that if they have some vague, abstract idea about how the concepts of version control apply to a more perfected government that they ought to push for change to the behavior of our government such as getting them to adopt certain open source software products and alleged open source process practices. These influencers disregarded their prospective users by not first asking the questions you did (i.e.: what is the existing workflow? and why?). They’re selling something. (So am I, but I’m selling something that’s less trouble-making!)


  • Great ideas everyone! I work on a federal government. You all are light years ahead of us.

    Trust me.

    I think that some agencies, and likely the Obama administration, would see the value in version tracking. I agree that legislation drafting is a great use-case for such a tool.

    Are there any corporate or nonprofit Websites that use version tracking well right now? Wikipedia’s history pages are not simple enough to understand- it needs to be REALLY easy to use and understand for us to government managers to get it.

    Several agencies are starting to fuss around with using wikis internally–frankly, that’s a radical transformation even on our intranet. There’s hope that Obama may accelerate this philosophy so that public-facing version control may come to fruition.

    There are many hurdles though.

    One course of action is for people like you to do this for Change.gov or Senate.gov or any Website to show how it would work. Once this kind of tool makes headlines for revealing some unscrupulous edit, we’ll learn about it.

    Beware that this approach may backfire and cause more bureaucracy for Web managers like me to make simple edits.

    Tim’s initial idea of having an ‘article’ page managed by admins and a ‘discussion’ page for the public, is fantastic and encourages the kind of transparency and democracy that we should aim to achieve in the next four years.

    But please do not count on the feds to be the bleeding technological edge. If it’s possible to show version control in a simple form, why aren’t NGO’s like Sunglight Foundation using it?

    Don’t expect feds create these tools. We would waste way too much money and create a tool that no one would like or trust. Show us how it would work. Make a name for yourself in the process.

    Matt Coyler’s example above may work functionally, but I can’t understand how to interpret it. It needs to be really simple to understand and use. Give us a solid simple example that we can show to clueless managers who control purse-strings. Design it for ‘flashing 12s’ if you want managers to get it.

    If one federal agency started using a tool like this, it could spread more easily to Senate.gov and elsewhere in government.

    It’s a great idea. Now (11/08) is a great time to make an impact and have your voice heard.


  • Envirogovy,

    I’m against a “discussion” page.

    It would be necessary for officials to censor such a page to remove inappropriate comments. That’s too big a can of worms. A firewall is desirable between government and public comment forums other than in the case of formal public hearings.


  • I worked as a legislative aide to US Rep Sam Coppersmith in the early 1990s. We were one of the first offices on the Hill to provide staff with internet access, post docs on Gopher and prided ourselves on our use of technology to serve Arizona’s first congressional district.

    I think the challenge to overcome is that you’re really asking for folks to change the culture, process and tools they use everyday and are comfortable with–all in the name of transparency, which is something of an abstraction.

    One could argue that much of Congresses business already has systems to make information public. Everything discussed on the floor of the House and Senate is posted in the Congressional Record. Committee reports are posted on thomas.loc.gov Obviously, these processes don’t capture offline discussions or how compromises get made in legislation.

    I also agree that they’re inadequate and that today’s tools allow for unprecedented collaboration and ultimately better government, “for the people, by the people.”

    I wonder if there’s a way to pick a lawmaker for a prototype project who would be willing to champion a “hyper-transparent” approach to legislating. The idea would be to make a law, or policy with version control–I’m not following Congress closely to know who would be a good candidate for a prototype project, but I’m thinking a more junior member of Congress (though perhaps not a first-term member–they’ll be busy trying to figure things out.) I also think it might be easier to start an effort on the Senate side–the 6 year election cycles invite more tinkering–though I wouldn’t exclude House members–here are links to the relevant House and Senate Committees:



    Creating a culture of use around collaborative tools would be really compelling. I wonder if any of the offices are using Google Sites/Google Docs instead of MS Office as tools–you’d have a good start there. To most folks, my guess is that you’re going to get blank stares when you talk about “version control.” Pitching it as a smarter, easier way to work might help.

    Eager to continue the conversation.

  • Henry

    Transparency indeed! A CM tool for the budget is bigger than my little brain.

    However, in the interest of an interim solution at change.gov (my train of thought could be off, so some review would be nice) …

    I checked on change.gov’s site. Some quick research shows it’s running on an Apache server hosted by bluestatedigital.com. Blue State Digital, in their hosting packages, has the CMS Expression Engine. Looking at some of the HTML produced on the site suggests that change.gov may be using that software. I’m unfamiliar with that CMS package, but apparently it has the capability for version controlling by turning on a feature called Entry Revisioning. So, anyways, I dropped them a submit asking them to turn that option ON as a first step. It appears there’s some basic modules in the CMS for doing RSS and since it’s PHP based, perhaps some modules already exist for publishing approved versions out … not sure on that part.

  • Sunlight Foundation folks and friends have been discussing version control in the context of the Open House and Open Senate projects online:

    and Josh Tauberer of GovTrack put up a preliminary cut at change tracking for the September/October 2008 bailout bill at


  • bowerbird

    good work, henry!


  • @Henry, you wrote:

    the CMS Expression Engine. Looking at some of the HTML produced on the site suggests that change.gov may be using that software. I’m unfamiliar with that CMS package, but apparently it has the capability for version controlling by turning on a feature called Entry Revisioning. So, anyways, I dropped them a submit asking them to turn that option ON as a first step.

    Apparently you mean this feature but, no, that isn’t what we’re talking about.


  • This is a really interesting suggestion and it’s definitely the kind of thinking and action Sunlight is trying to support. We’re in the middle of considering a lot of different options, in light of the opportunities presented by the new administration’s stated commitments to improve govt transparency etc.

    We love that so many people are engaged by trying to think this through. But we’re not sure this is the most effective use of our resources — but if someone wants to pitch us either for a mini-grant or a big grant, the door is always open.

    Clay Johnson
    Director of Sunlight Labs
    Sunlight Foundation

  • James

    many of our legislators are lawyers and lawyers have been using change-tracking tools and version techniques for years… this is not only worthwhile, but doable.

  • Tim and others have been talking about this for quite some time. But as Adriano and Thomas noted, one needs to look at the existing legislative system. The fact is that Congress already does version control. Legislators can only revise bills using amendments, or at the committee level by approval of the committee. Amendments are generally quite clear in what they do (like patches), and are tied to particular legislators. There’s nothing stopping anyone from diff’ing two versions of a bill, as I’ve implemented on GovTrack (and this will get much cooler in a few weeks).

    The bottom line is this: Congress already has the tools as its disposal. They have the procedures in place. The problem (as some see it) is that they are doing too much off the radar.

    To compare it to the world of FOSS- It’s like your favorite project using Subversion, but the authors are making large commits and are proxying the patches sent in by corporate interests. Telling this project to use Bitkeeper isn’t going to solve the problem! The problem is that you want more to be on the record, more to be tracked. This is a social problem, not a technical problem.

    Not that Congress is on the leading edge of technology or openness. The procedures in place may exist, but Congress doesn’t do a great job of actually providing the amendments to the public to view. For an analogy again: it’s like an OSS project that uses Subversion but doesn’t give the public access to the logs, or doesn’t provide an RSS feed. The Sunlight Foundation’s Open House/Senate Projects (disclaimer: which I’m heavily involved in) make some recommendations in this area. As it relates to this, we want Congress to first make use of technology to supplant their existing parliamentary system — before anyone goes overboard and asks them to overhaul their centuries-old rules.

  • Ray

    I think it is a great idea that needs to be further examined…Joshua isn’t congress supposed to be a proxy of the people not the lobbyist? How then should we define when the legislative process should begin? Don’t the lobbyist usually compose the drafts? How do we take this from idea to something closer to a political movement…If Obama wants to be about change than this would certainly be a step in the right direction…I can’t wait until the day when we can just “digg it” to vote/craft legislation and then rule of law would really be in the peoples hand…Tim it’s time to start a revolution…

  • Ray: I’m as opposed to lobbyists as the next guy (to be imprecise about the problem). All I’m saying is that Congress already goes through the motions of revision control. Forcing on them a particular technological implementation doesn’t mean they’re going to use it any more than they use their existing low-tech, parliamentary system. The problem is elsewhere.

  • @Clay,

    Nice to know. I think I’ll take you up on that shortly.


  • Envirogovy

    Joshua Tauberer- your Economic Stimulus Bill Text Tracker comparisons are really good: http://www.govtrack.us/special/econstimbill/changes.xpd?id=1. Nice work.

    Ray, I like the Digg idea too. I haven’t seen any government sites using thumbs up/down yet. There may be some simple ways to test it out as a test case on .

    One possible issue with it is that government Web sites can’t use persistent cookies. This may cause problems for those wishing to game the system by multivoting.

    Is there a tool/extension that we could add to try it out on EPA’s blog (WordPress)?

    Ideas? Contact me on Twitter @kolpeterson

  • How timely! I have been working on a paper proposing a Congressional Versioning System as part of Hal Abelson’s MIT class “Law and Ethics on the Electronic Frontier.”

    I think the central problem is not so much that we don’t have revision, but that there is no source control to clearly delineate authorship. I’d like to see each line in each bill have an author tag in the XML files that are published on thomas.loc.gov . According to a conversation I had with a senior lawyer at the House Office of Legislative Counsel, the current collaborative writing system is an ad-hoc combination of emailing PDF files back and forth and occasionally using a shared drive. Transitioning from this to an authenticated source control system shouldn’t be technically difficult (they already use a customized XML editor), the main obstacles are political and legal.

    The Offices of Legislative Counsel are the ones who actually write much of the legislation, and their opinions to legislators are protected by attorney-client privilege. They can’t even share with other congressmen working on similar things. And as Ted Bongiovanni said, it’s going to be hard to find a champion of abstract transparency, particularly when it threatens the horse-trading where the work really gets done. I think that’s the interesting part, but I can also see why it’s likely to remain hidden, or just shift somewhere else away from scrutiny.

    The arrow excise tax exemption was originally part of previous efforts to extend the Alternative Minimum Tax. It was also separately introduced in the Senate by the delegation from Oregon as S3055. The point is, though, that this sort of careful tracking should be done automatically to reveal the author and intended beneficiaries of a particular earmark.

    I’ll post the paper on my website when it’s done, for all who are interested. It’s due on the 10th, and I’ll be citing this conversation.

  • James

    You may be aware of some of the following, if so, please excuse my wasting your time…

    The version control software for lawyers has more funtionality than simply that…

    what I have found for version control is the following.

    Mystic Management Systems
    Amicus Attorney
    Surround SCM (for software, adaptable?)
    master control
    cama software
    Version Control Pro 4.7 by upRedSun (freebie, robust?)

    there are also automated version control in the software development world (multiple developers in multiple locations), as well, are they adapatable.

    I hope this helps, thanx for your effort on a difficult problem.

  • Dean

    Hat’s off to the guy who archived it, but even that archive is out of date. Specifically, I saw the “service” section was much shorter and included words about mandatory community service for middle schoolers of 50 hours a year, and 100 hours a year for college students. After a day, that changed, and “mandatory” was dropped, to be replaced by “with the goal of” for the middle schoolers, and an added $4000 for the college students. I saw that changed from one day to the next. I don’t know what it says currently.

    I’m glad to see that first thing changed (I mean, “mandatory”, for middle school kids?), and I like the $4000 for the students (as long as colleges don’t view this as an instant $4000 tuition hike, which some will)…

    Having said that, I’m a little disturbed that someone thought putting middle school kids to mandatory community service was a good idea.

    I agree with the revision idea of Tim’s but that’ll bring some accountability to the equation, and the politicos won’t stand for that; they’re do busying doing the CYA dance (witness the committees who wouldn’t listen about Freddy and Fanny more than a year ago, and getting of scott free from the responsibility now).

  • Brian Cartwright

    The old way of governiing is to come up with policies behind closed doors then tailor them to be fed through the narrow channel of the media for the proper spin. Now if the doors are opened and the interested public can get all the changes in policy as they happen, obviously that’s smarter and better for us out here, but I suggest we’re only looking at half of the picture.

    From the perspective of the new President trying to do things a new way, how is the broad channel useful? It’s not only an outflow pipe informing the world of how decisions are being made, it is hopefully also a tool to search and find truly new solutions for complex and interrelated problems.

    A possible pitfall is that resistance will come as paranoia that the government is being run by a blogocracy, and if change.gov were handled wrong I think such paranoia would be justified. So what is the mechanism to handle this electonic suggestion box?

  • Well…

    http://leagueoftechnicalvoters.org wants to give you your xmas present early Tim…

    Check out http://change.wikia.com We are scraping change.gov daily (news twice daily) and putting it in a version control system on wikia.com using mediawiki

    http://change.wikia.com/index.php?title=Learn_obama_biden_transition_agency_review_teams&diff=4499&oldid=3229 – an easy way to see additions to the the team.

    I also hope that you will ask the community to step up and help us document it more hence our decision to put it in mediawiki and host on wikia.com

    and with the permalinks – bloggers had a point of reference that doesn’t change!

  • Great idea! As the wiki model works its way through the world – from the original encyclopedia to entire university courses – there is no better place to implement it than in our own government.

    I especially like your idea regarding being able to visually see what members of congress actually do; knowing that Congress man X pushed a change to Bill Y would be great. Not only would it help great congressmen get elected (check out what I did!), but it would help keep the bad ones out (Look at what he did!)!

  • Dale

    Change.gov, the website of US president-elect Barack Obama’s transition team, has undergone some important and exciting changes. Among them is the site’s new copyright notice, which expresses that the bulk of Change.gov is published under the most permissive of Creative Commons copyright licenses.

    alan watts books