Bulk Access to Government Printing Office Data

Carl Malamud of public.resource.org just wrote to let me know that he’s begun harvesting, and making available for bulk download, all the data currently being provided by the Government Printing Office, including the Congressional Record, various presidential papers (up until 2004, when they stopped being made public), the Federal Register, and other government documents. These have been available one at a time from the GPO, but now, Carl is making them available for bulk download. This is useful for anyone who wants to do text analysis. Carl wrote:

With help from the Institute of WGET and the fine
folks at Ibiblio, we now have bulk interfaces to the
Government Printing Office up and running. Here’s
the Harvest Report.

As you may know, people have been complaining for years
that these databases haven’t been available in bulk for
free download. Most people thought there were two
solutions available to this problem:

1. Ask GPO’s help to make the data available in bulk by,
e.g., a library.

2. Ask GPO to provide the service themselves.

We called up the Government Printing Office and presented
them with a third alternative: we were planning on harvesting
their data straight through the user interface. This cost
zero resources for the GPO, so their answer was “knock yourself
out.” They even gave us tech contacts in case we had problems.

Turns out sometimes all you have to do is ask …