|
|
|||||
Unlikely Group Working Happily Together To Solve Patent ProblemPeople following the issue of open sourcing the U.S. Patent Database might have been surprised to read an announcement in the official business opportunities web site of the U.S. Government: Synopsis for Public Data Dissemination Sole Source Contract to Google, Inc. While the first reaction of many might be "OMG, WTF, how could they," this is actually good news, with an unlikely cast of characters working together including Google, Intellectual Ventures, and the Internet Archive. In September, the Patent Office announced a rather strange "Request for Information" (RFI). Under this proposed scheme, the Patent Office would receive a substantial (upwards of $10 million!) donation of equipment from a vendor. In return, the vendor would get to be the official distributor of the patent database to the public, and would get to sell "value-added products." Among other things, the vendor would get access to the patents before the public does, allowing them to mine the database, and would be allowed to sell a variety of bulk products. While the RFI makes a nod to public access, like all these Zero-Dollar deals the government cuts, there would be a lot of limits on what is "public" data as the vendor tries to recoup their investment by selling the so-called "value-added" products. Readers may remember a similar fiasco with the General Accountability Office where the Federal Legislative Histories were given away to Thomson West and now even the U.S. Congress has to pay to access this material. The patent database is no ordinary database. This is the only database specifically called out in the U.S. Constitution as being the responsibility of the U.S. Executive Branch to run! A lot of people think this Zero-Dollar deal the Patent Office is contemplating kind of stinks, and I'm really pleased to announce that a broad coalition has come together to make this data more broadly available immediately:
It goes without saying that Google, the Internet Archive, and Intellectual Ventures are 3 groups that don't often work together, and I think this illustrates the compelling public interest in making the patent database more broadly available. We announced this Section 8 Task Force in a letter to Congressman Mike Honda. And, we also sent in a FOIA request to the Patent Office, putting them on notice that we expect any responses to their RFI $0 boondoggle to be made available to the public, as required by law. In the long-term, Patent Office just needs to fix their system instead of resorting to silly $0 deals. They have 600 staff in Information Technology and spend hundreds of millions of dollars. Surely, they can find a way to serve the public as part of that? Putting a lien on the Patent database in return for $10 million in hardware instead of fixing their 70's-era mainframes just doesn't make sense. In the meantime, we should have the first 8 terabytes of data up pretty soon. Those interested in learning more about the issue are urged to consult the paper trail on our PTO page which includes letters to and from Congress, and pointers to the Patent Office procurement docs. |
|||||
|
|||||
Comments: 4
Luigi Montanez [ 8 November 2009 09:17 AM]
I'm a bit confused as to how exactly Intellectual Ventures is involved in this. Have they been privately compiling patent data? If so, how is it different than the data from PAIR?
Carl Malamud [ 8 November 2009 09:22 AM]
Hi Luigi. Intellectual Ventures bought all the commercial data feeds from USPTO which come on DVDs. That includes page images, applications, grants. They're one of a dozen vendors to have bought these roughly 1,000 DVDs of data.
Intellectual Ventures is simply putting the 1,000 DVDs on a disk drive, and making it available to people like the Internet Archive (Brewster got his disk last night) and Public.Resource.Org (we're expecting ours this week).
In addition to the data on the commercial products, there is additional information available on-line inside of the PAIR system. People have tried for a while to crawl PAIR, but the PTO infrastructure is so poor that it was quickly overloaded and they put in a CAPTCHA system. So, individuals have been able to access this additional info on a one-off basis, but bulk providers have been unable to incorporate it into their systems.
Our goal is to *not* be in the crawling PAIR business, but it is a decent stopgap, particularly when coupled with the bulk DVD data, and hopefully will motivate the PTO to take more positive steps to provide their own database directly to the public.
Luigi Montanez [ 8 November 2009 10:21 AM]
Thanks Carl, that clears things up. Looking forward to seeing the data out there.
Ben Hoyle [ 9 November 2009 11:04 AM]
Hi,
This looks to be excellent news. The PAIR system is notorious for being out of action outside of US business hours and so any help to fix this, and get around the immensely irritating CAPTCHA system, would be great.
I can foresee the usual suspects objecting to this but in my mind any third party who is going to make patent data more accessible deserves to be encouraged. Maybe one day I could monitor my US cases via a Google web app, getting reminders in my Google calendar of due dates?
Ben