Stop fishing and start feasting: How citable public documents will change your life

Putting government documents and data online is a great step towards making our government process more transparent to the people it serves, but in many ways simply making the material available is like serving someone dinner by giving them a pond full of fish. The pond is huge and the poor dinner guest doesn’t have any tools. Worse, they’re only looking for one particular bass, and every time someone sends them to where they last saw the fish it’s long gone.

Gov 2.0 Expo 2010
The recent healthcare bill was more than 1,000 pages long. The budget can often be half again that big. Commenting on these types of documents as they are currently implemented is extremely challenging. Pointing a finger at that big pond and telling someone that you swear you saw a fish isn’t very effective. It’s even worse when someone swears they saw a fish that isn’t really there and it is effective because no one is willing to refute them. No one has time to wade around themselves and so they take it on faith. The recent “killing grandma” scare is an excellent example.

Citations, first, are a way of pointing at the fish. A simple paragraph level of granularity for references should be enough. This promotes ease of implementation and use and provides a tight enough zoom to bring someone right to the material being discussed.

The next problem is that fish move. If you’re trying to point out a moving fish, and show it to someone later, you need to have a photograph with a timestamp. That line in the budget about forcing our children to manufacture chemical weapons might have moved to page three the next day, or a wily senator may have changed the wording and put it under a different heading. Proper citability requires an archived snapshot of the online material that maintains the integrity of any reference links.

Lastly, for someone to believe you about this fish, you need to have a way of pointing out where you saw it at the specified time. They’ll want to know it was the same pond.

Making it possible to create timestamped permalinks at a paragraph level of granularity would be a huge leap forward in increasing government transparency through its online documents. The same principles apply when producing citable government data. When recovery.org decided to display visual representations of the data coming in about recovery money around the nation, it quickly became clear that some amount of data was erroneous. When the errors were reported and the data was later modified, there wasn’t any way to go back and compare the two versions to see what changes had taken place. A blogger, reporter, statistician or scientist should be able to run a query against any specific collection of government data, as it was published, for a given version or moment in time.

WHAT WE’RE DOING

The nonprofit, nonpartisan League of Technical Voters has proposed a simple, easy to build and implement citability solution. Open source software development is underway and a wide range of government institutions are already on board. If you would like to help with this effort, consider being part of our upcoming codeathon or create your own codeathon.

tags: , , ,
  • Greg

    I love this idea. You may want to check out what has been done at djangobook.com with line item comments using the Yahoo UI. It would also be awesome to have this available on the state and local levels if you are building the infrastructure/framework.

    My two cents…

    Greg

  • Silona

    Hi Greg!

    The architecture is so simple… it can be used for Federal, state, city and international. I even have some Australians working remotely.

    Seems likely there is a way to feed in the archive server data to djangobook web app…

    The basic structure is here http://www.gliffy.com/publish/2036392/

    Some code is here http://launchpad.net/citability

    and of course the wikis will be updated all weekend as be work on the various implementations!
    http://dccodeathon.pbworks.com

    cheers,
    Silona

  • chris mccraw

    What’s being proposed here is a lot of work for a lot MORE value. Silona’s examples are hard to trump in clarity and obviousness, and it’s good to hear that some government agencies are on board.

    But to make this a reality, we need code and we need it to be open source so that as @Greg said, it can trickle down to state and local levels. The codeathons are definitely right up that alley–get together with like minded geeks and build the system. Patriotism at its highest level. And, as an attendee of a prior codeathon, I call ‘em a helluva lot of fun and a great place to learn and network, too!

    …because a democracy is run by those who show up. And this non-partisan project is for the benefit of the governed, which we’re all gonna be for the rest of our lives.

  • Silona

    Hi Greg!

    The architecture is so simple… it can be used for Federal, state,
    city and international. I even have some Australians working remotely.

    Seems likely there is a way to feed in the archive server data to
    djangobook web app…

    The basic structure is here http://www.gliffy.com/publish/2036392/

    Some code is here http://launchpad.net/citability

    cheers,
    Silona

  • Thomas Bjelkeman-Pettersson

    Whenever I see some of our code committed to Github, I think of legislation presented the same way. Clearly marked as to what was changed and by whom and why. It isn’t like we have to invent something really new. It is just a matter of implementing what we already know how to do in a different context.

    Keep up the good work Silona.

  • Mike Gifford

    This is a great concept. I ran into this WP tool recently and think that this type of commenting is very useful for understanding policy:

    http://digress.it

  • Joe Carmel

    About 6 months ago, I started creating a prototype/demonstration of sub-document linking for legislation at http://LegisLink.org.

    This idea differs in several respects from Citability.org goals but attempts to fill a couple of vacuums that currently exist.

    (1) Legislation tends to reside at multiple Internet domains even within one jurisdiction. Usually there’s a site for the lower chamber, another for the upper chamber, and a separate site for consolidated law, not to mention voting records, reports, committee work, etc. A legend is needed for each jurisdiction especially for the casual user.

    (2) It’s often not easy to link to a section or paragraph in a bill for most legislatures because (a) the legislature has chosen not to include sub-document anchors or (b) sub-document anchors are only exposed under the covers.

    LegisLink.org demonstrates redirection to meet these needs even when the jurisdiction hasn’t provided the capability to do so.

    For example: http://legislink.org/us/HR-1-IH-1221 provides a direct link to section 1221 of HR 1 from the current Congress. Add “-pdf” to the end of the above URL and you’re redirected to the page where section 1221 exists in the PDF file.

    LegisLink is an open source collaborative effort among software developers and jurisdictional experts. Jurisdictional experts help define useful LegisLink formats for their use and software developers can help build the redirection mechanism. By consolidating the effort, functions (such as parsing the PDF file to find section 1221) can be reused for multiple jurisdictions.

    Please join the wiki at http://legislink.wikispaces.com if you’re interested in helping out. Thanks,

    Joe

  • Torsten Houwaart

    @ Thomas Bjelkeman-Pettersson:
    YES YES YES !
    I thought the same thing a few days ago. Legislation in a Version Control System would be the best^2. Not only would it make citing easier, it would make things so much more transparent overall.
    I honestly think this must be done.
    Also, this could lead to an easier and better legislation process in the long run.

  • Silona

    Yep @Thomas

    the BACKBONE for the citability concept is distributed versioning! The current implementation is in Bazaar – just because there were easier to use tools already in place for the govt folks.

    But I have one crew that is talking of doing a GIT version at the http://dccodeathon.com

    My idea is FIRST we back end it on with citations – basically capture what is created and put into a Distributed Versioning system with a pretty user interface.

    Second get them to Pubsubhub into the Distributed Versioning System.

    and thirdly create systems with it baked Distributed Versioning that can front face to the public.

    There are some privacy issues as to why the entire system can’t be a front facing though and some serious fear issues. Those some of fears are justified. I don’t want all my govt documents available to the whole world. I did get cyberstalked in the mid 90’s and had him show up at my door.

    But there are no real issues against Citations! We can append it without any changes to their workflow.

  • Jim Harper

    I like the fish-in-a-pond metaphor, Silona.

    If I may, I’d like to alert readers here to a project working to get a lot of minnows onto individual hooks:

    Earmarkdata.org is a campaign asking Congress to publish data about earmarks in a usable form. We’re generating signatures on a petition and perfecting the data schema at http://www.earmarkdata.org

    With that data, we’ll be able to do a better job of public oversight, mapping and tracking where the money goes, like this: http://www.washingtonwatch.com/bills/earmarks/

    See you this weekend at the codeathon, Silona!

  • Pittsburgher

    Check out: MyGov365.com

  • Andrew

    It’s great to see some of these ideas being stated in a concrete way, maybe it’ll mean some actual progress on this front. A few days ago a friend and I were talking about a related idea — something like a source-control tree showing the source when a state or local government adopts something like the UCC or a model code.

  • Jamie Darlon

    i think you mean recovery.GOV has errors – the recovery.ORG site has been widely lauded.

  • Chris Messina

    It’s great to see this work continuing and to have someone so passionate about it continue to drive it.

    My immediate reaction — with both an “open tech” and Google hat on — is that this effort might really benefit from a some strong simplification.

    That is, when I visit citability.org, if you provided some actual HTML markup — to *show* me what you want me to do — that might go a long way to demonstrating what you want people to do. While I know that you do support more than just web markup, it seems like it might be a good idea to start there — or to provide a kind of “cite your own adventure” where you ask a visitor to specify a filetype and then walk them through the conversion.

    I understand the picture that you’re trying to paint and think it’s a critical one; at the same time, helping people to join in that story and get a clear picture of what the before/after looks like might really help motivate people to take action when they see how simple it can be.

    Like: where’s your hello world example? Can you put it on your homepage?

    I took a look at your Gliffy doc and was pretty overwhelmed — it’s a grand vision, but as the saying goes, any progress starts with the first step!

  • Bob Gourley

    Thanks much Silona for putting this important work in a very easy to understand context. I have no doubt in my mind that you are changing the world here and millions will benefit because of you and your persistent, focused efforts.

    Cheers,
    Bob

  • Olivier Travers

    Opencongress.com has granular permalinks which, alone, makes it much better than the library of Congress’ own Thomas. Links to the latter, last time I checked, had little intra-document linkability, and much worse, expired after a few minutes or creating them.

  • Stuart Weibel

    There is a logical inconsistency between the notion of a time-stamp on one hand and a ‘permalink’ on the other. Without truly persistent identifiers, a timestamp is useful, yes… critical even.

    But documents that are intended to have persistence deserve identifiers that are also persistent, and the underlying infrastructure to support them. A persistent identifier should be both location independent and time-invariant. Globally scoped, they are thus easily searchable as well.

    The Government Printing Office has attempted to address this need by using Persistent URLs (PURLs) for identity. The PURL technology, available from PURL.ORG for free, is one of the longest pieces of continuously available web technology. It has recently been re-engineered to accommodate all current HTTP headers.

    Perhaps other agencies might take a page from GPO’s book and establish the necessary policies to assure that the products of government activity are identified (and addressable) in a consistent and persistent manner.

    Stuart Weibel
    Senior Research Scientist
    OCLC Research

  • Silona

    @stuart
    attaching location w datetimestamp is not inconsistent for a citation. Citation means at this place this is what was said. The web is not like a printed document and changes frequently. PURLs assumes you are doing content management and in some way frontfacing those changes. Citability does not make that assumption therefore it is cheaper and easier. Workflow integration combined with redaction is complex. We do not seek to solve that problem. We are simply seeking the ability to Cite.

    Also, your method does not work for live datasets, video citation, etc. We wanted to address the problem of how to cite a customizable, interactive, animated widget.

    We also wanted a way for others to clone and use the data, documents, video and add additional information in an authoritative manner. The distributed version system enables this.

    @chrismessina
    Citability is not a standard. It is an interoperability framework. This is why we are able to have parsers for all that other preexisting standards like Dublin Core, URN:LEX etc. I hope you can check out the video presentations from the codeathon as that will make it clearer.

    We did working demos of
    Citable datasets
    Citable video (and annotation)
    IE citability plugin
    Citable HTML

    The closest Citability comes to a standard is asking that the frontfacing URL’s have location, time and granularity. But how that is formatted is up to the implementation. We just ask that it be user friendly.

    @Olivier
    Yes opencongress.org is awesome. Spoke with David and he agreed this would help them because by cloning off the distributed archive/versioning server they could also prove the data they have scraped is accurate.

    Honestly, because it is put into a distributed versioning system it is citable to an extremely granular level. We just believe for usability sake those URL’s need to be parsed in an easier to understand format. In fact the only REAL cost to implementing this solution for a governmental entity is deciding what that URL structure should look like. Instead of reworking their entire system for millions, a new skin on BZR/GIT would cost tens of thousands.