People following the issue of open sourcing the U.S. Patent Database might have been
surprised to read an announcement in the official business opportunities web site of
the U.S. Government:
Synopsis for Public Data Dissemination Sole Source Contract to Google, Inc.
While the first reaction of many might be “OMG, WTF, how could they,” this is actually good news,
with an unlikely cast of characters working together including Google, Intellectual Ventures, and the Internet Archive.
In September, the Patent Office announced a rather strange “Request for Information” (RFI).
Under this proposed scheme, the Patent Office would receive a substantial (upwards of $10 million!) donation of equipment from a vendor.
In return, the vendor would get to be the official distributor of the patent database to the public,
and would get to sell “value-added products.” Among other things, the vendor would get access to
the patents before the public does, allowing them to mine the database, and would be allowed to sell
a variety of bulk products.
While the RFI makes a nod to public access, like all these Zero-Dollar deals
the government cuts, there would be a lot of limits on what is “public” data as the vendor tries
to recoup their investment by selling the so-called “value-added” products. Readers may remember
a similar fiasco with the General Accountability Office where the Federal Legislative Histories
<a href="given away to Thomson West
and now even the U.S. Congress has to pay to access this material.
The patent database is no ordinary database. This is the only database specifically called out in the
U.S. Constitution as being the responsibility of the U.S. Executive Branch to run! A lot of people
think this Zero-Dollar deal the Patent Office is contemplating kind of stinks, and I’m really pleased to
a broad coalition has come together to make this data more broadly available immediately:
- Intellectual Ventures, the IP group founded by Nathan Myhrvold, is donating several terabytes of the back file to Public.Resource.Org,
the Internet Archive, and a variety of other groups to make available to everybody.
- Google asked for permission to crawl the public application system (known as “PAIR”). The
announcement by the Patent Office of a “sole source contract to Google” was the government’s way
of saying we have permission to crawl their system and bypass the CAPTCHAs. This is good news, because
the PAIR system contains the “binders,” which is all the material that supplements the basic applications
- The Internet Archive has set aside a boatload of disk drives to serve this data. In addition,
Public.Resource.Org will provide the usual rsync and FTP, and we expect a variety of other groups
to provide mirrors both for bulk access and end-user systems.
It goes without saying that Google, the Internet Archive, and Intellectual Ventures are 3 groups that don’t often work together, and I think this
illustrates the compelling public interest in making the patent database more broadly available.
We announced this Section 8 Task Force in a letter to Congressman Mike Honda. And, we also sent in
a FOIA request to the Patent Office, putting them on notice that we expect any responses to their
RFI $0 boondoggle to be made available to the public, as required by law.
In the long-term, Patent Office just needs to fix their system instead of resorting to silly $0 deals.
They have 600 staff in Information Technology and spend hundreds of millions of dollars.
Surely, they can find a way to serve the public as part of that? Putting a lien on the Patent database in return for $10 million in hardware instead of fixing their 70’s-era mainframes just doesn’t make sense.
In the meantime, we should have the first 8 terabytes of data up pretty soon.
Those interested in learning more about the issue are urged to
consult the paper trail on our PTO page which
includes letters to and from Congress, and pointers to the Patent Office procurement docs.