U.S. House makes legislative data more open to the people in XML

Opening data in Congress is a marathon, not a sprint. The 113th Congress is making notable, incremental progress on open government.

It was a good week for open government data in the United States Congress. On Tuesday, the Clerk of the House made House floor summaries available in bulk XML format. Yesterday, the House of Representatives announced that it will make all of its legislation available for bulk download in a machine-readable format, XML, in cooperation with the U.S. Government Printing Office. As Nick Judd observes at TechPresident, such data is catnip for developers. While full bulk data from THOMAS.gov is still not available, this incremental progress deserves mention.


This change has been a long time coming, although more needs to be done to fully open the People’s House to the People. In April 2011, Speaker of the House John Boehner and Majority Leader Eric Cantor sent a letter to the House Clerk regarding legislative data release. In September 2011, a live XML feed for the House floor went online. In September 2012, Congress launched a beta version of Congress.gov but failed to open the data.

“Thanks to GPO, all House bills for this Congress will be available in one XML file that can be downloaded by anyone,” said Speaker Boehner, in a joint statement with Majority Leader Cantor at Speaker.gov. “This is a win for every American who believes in open government. Making legislative data easier to use for third parties, developers, and anyone interested in how Congress is tackling current challenges is a priority for House leaders. We’re going to keep working to make the legislative process more transparent and to better connect lawmakers with the people we serve.”

In a post on Tuesday at Speaker.gov, Don Seymour, digital communications director for the Speaker of the House, detailed the progress made during the 112th Congress:

This project is the first of several to be rolled out in the 113th Congress that were coordinated or initiated by the Legislative Branch Bulk Data Task Force. The task force was created to expedite the process of providing bulk access to legislative information and to increase transparency for the American people. It includes the House Clerk, legislative branch agencies such as the Government Printing Office and Library of Congress, representatives from House leadership and key committees, and the House Chief Administrative Officer.

Open government is and has been a priority for House leaders. In fact, the Clerk began offering real-time updates on House floor proceedings in XML back in 2011. The feed of real-time information complemented HouseLive.gov, a new video streaming feature they set up for desktop and mobile devices. The House also began utilizing new low-cost video conferencing tools, streaming committee hearings online, working with developers and transparency advocates, and more.

As Speaker Boehner said, this is good news for every American. Despite the abysmal public perception of Congress, genuine institutional changes in the House of Representatives driven by the GOP embracing innovation and transparency have been happening over the last three years. Open government in the House also enjoys a rare status in Washington these days: bipartisan comity. These improvements built upon bipartisan progress made while Representative Nancy Pelosi held the Speaker’s gavel, including putting committee hearings online, putting expenditures and lobbying disclosure online, and changed the franking rules to allow for the use of new media, like YouTube, Facebook and Twitter.

Democratic Whip Steny Hoyer praised the GPO and House Clerk for providing bulk access to House bills in XML:

“The actions this week by the GPO and the House Clerk are significant steps towards making the legislative branch more open and transparent,” said Whip Hoyer. “Congress has a duty to share information about legislation being developed and deliberated, and this new effort will allow the public to follow and engage with Congress in innovative new ways.  I commend GPO and the House Clerk for their actions, and hope that other legislative branch entities like the Library of Congress and the Senate will follow suit by including additional legislative information that is already publicly available, yet not accessible, on Thomas.gov.”

A more open road ahead?

As Tim O’Reilly observed in 2011, the current leadership of the House do seem to be doing a better job on transparency and open data than their predecessors. Jim Harper’s analysis of the government’s data publication process substantiates that progress. Writing at the Cato Institute, where he is the director of information policy studies, Harper praised the House for this step forward:

I believe the public has an Internet-fueled expectation that they should understand what happens in Congress. It’s one explanation for rock-bottom esteem for government in opinion polls. Access to good data help produce better public understanding of what goes on in Washington and also, I believe, more felicitous policy outcomes — not only reduced demand for government, but better administered government in the areas the public wants it.”

Harper also offered some constructive criticism for improvement:

That I’ve been able to find, the XML is not well documented. What each of the technical codes means is understood by several people in Washington’s transparency community, but the idea is to make it available very broadly, so the documentation should be very strong. The information at xml.house.gov should be updated, tightened up, and made easily available to the people gathering bill data on FDsys.

The XML data structures put in bills are limited in terms of what they convey. There is rudimentary information about who introduced and cosponsored bills, what committees they were referred to, and other procedural information. That’s good. But the effects of bills—on agencies, existing law, programs, places—this is not available in machine-readable code. That would be great.

Josh Tauberer, the author of Open Government Data, added some caveats on the House’s move to bulk bill XML. Tauberer is the civic hacker behind Govtrack.us, which has been scraping and making legislative data more open for years. He also contributed a chapter to Open Government in 2010.

In his comments, excerpted below, he notes that “there’s no new data here, and thus not the data that the bulk legislative data advocates have been asking for.” In other words, this is evolutionary change, not revolutionary change.

What we’re seeing with the bills bulk data project is how the wave of culture change is moving through government. Over the last two years the House Republican leadership has embraced open government in many ways (my 112th Congress recap | the new House floor feed). With this bills XML project, we’re seeing more legislative support agencies being involved in how the House does open government.

This isn’t a technical feat by any means, but it is a cultural feat. The House and GPO worked together to institutionalize a new way for the House to publish bulk data.

Because of the way Data.gov is managed in the executive branch, we’ve become accustomed to big announcements. The bills bulk data project and the other recent projects show that the House is taking a different approach, an incremental approach, to open government data: publish early and often, gather feedback, then go on to bigger projects. This is something open government advocates have been asking for.

Daniel Schuman, the legislative counsel of the Sunlight Foundation, echoed Harper and Tauberer’s balanced praise for improved access, including a recent history of progress and suggestions for what remains to do:

“Ultimately, this path should take us past the point where all legislative information published on THOMAS (or its successor Congress.gov) is available online, in real time, as structured data that is capable of being downloaded in bulk. The most requested data is legislative status information, which is held by the Library of Congress and still is not available today as structured data, bulk or otherwise. That includes when the bill was introduced, who co-sponsored it, a summary of the legislation, and so on.

Status information is prepared by the Library of Congress, which has been historically recalcitrant to make this information available to the public in any other formats besides a series of web pages. But we know based on a March 2008 memo that the hurdle here is political will, not technology. That’s why this new announcement is encouraging. The task force is starting to crack open the vault. Let’s hope that the Senate and the Library of Congress are coming to share the House’s enthusiasm for transparency.”

As we head further into 2013, here’s hoping that the entire Congress takes more steps to make the content and status of proposed laws more accessible to the hundreds of millions of people its members represent around the country. As more progress is made toward freeing the data, it will enable the nation to track the progress of legislation in real time in the media and civic entrepreneurs to build better interfaces for understanding the proposal before Congress.

Related Stories:

This post is part of our series investigating open data. An earlier version of this post appeared on the O’Reilly Radar Tumblr.

tags: , ,