Aug 2

Tim O'Reilly

Tim O'Reilly

UC in Discussions to Join Google Library Project

According to the LA Times, the UC library system is in discussions to join the Google Library project:

Google is keen to have access to UC's 34 million volumes from 100 libraries on 10 campuses, which is described as collectively the largest academic research library in the world. UC wants to delve more deeply into the Internet revolution with a deep-pockets partner like Google paying the costs of scanning books.

As I've argued in a number of previous posts, publishers and authors should be delighted to have Google bootstrap them into the digital era, but unfortunately, the big NY publishers and the author's guild don't see it that way. (See Google Library vs. Publishers, Author's Guild Suit, and Google's Response, NY Times Op Ed on Author's Guild Suit Against Google, and Only 4% of Titles are Being Commercially Exploited for more background.)

While there are many concerns about the kind of market power Google could acquire over publishing as a result of this project, Google's initiative is innovative, useful, and a real boost to an industry that has yet to make significant headway with electronic books. What I think actually motivates the concern of publishers is the same as the concern that the music industry had: "Oh sh*t, someone is doing what we should have done years ago, and now we're going to have to play catchup." Sorry, the future doesn't wait.

The arguments marshalled by the publishers display either cynicism or shocking ignorance about how search engines work. If it's not fair use to make a copy of a book in order to make a search index, it's not fair use to make a copy of a web page either. Google doesn't actually show the pages from the book to a user unless it's in the public domain or they have permission from the publisher.

I had a debate a couple of weeks ago with a well-known west coast literary agent at the Stanford Publishing Course. She was fulminating about how what Google is doing is nothing but theft. "Do you know how a search engine works?" I asked. "No." "Well then, have you ever used Google?" "No."

Even more telling was a conversation overheard later in the course, in which a publisher of significant size remarked, in a display of bafflement worthy of Senator Ted Stevens: "What bothers me about the Google project is that I've heard they are scanning two copies of the book. What I want to know is: what are they doing with the second copy?!" What I want to know is whether I should laugh or cry.

Back to the LA Times article. It marshals some more good arguments for the Google Library project:

Daniel Greenstein, UC's associate vice provost for scholarly information, said that joining the Google Books Library Project — with its ability to search for terms inside texts, not only in catalog listings — would help "create access like we've never had before to our cultural heritage and scholarly memory. It's a whole new paradigm."

In an interview Tuesday, Greenstein said that such digitizing offers protection for writings that might be lost in natural disasters like Hurricane Katrina and earthquakes. "It's the kind of stewardship that is absolutely vital to us and the community in general," said Greenstein, who oversees digital projects for UC libraries.

A UC deal with Google could be announced within a month, officials said. However, the arrangement first faces close scrutiny from the UC regents and the publishing world for potential copyright issues and concerns that UC might lose out on future revenue.

tags:   | comments: 3   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 3

  Joseph Hunkins [08.02.06 03:12 PM]

Right on as usual Mr. Tim. Nice to see a respected publisher make the case for online distribution.

  Thomas Lord [08.02.06 08:17 PM]

If it's not fair use to make a copy of a book in order to make a search index, it's not fair use to make a copy of a web page either.

Nonsense. The publisher of a web page containing copyrighted content has made a deliberate decision about when and how to make that content available in electronic form in a public space. Caching and indexing are customary in that space. In contrast, scanning a print-only book to stash a copy in the corporate archives of a for-profit deprives the copyright owner of that economically consequential choice and achieves the end of acquiring a copy that should not exist under copyright law.

If Google wanted to play within the law they could scan books for the libraries only and for archival and research purposes (there are explicit exceptions in the law that permit thsi). They could donate indexing and excerpting hardware and software to the libraries and, while Google would not then have a monopoly on the results, public access would be just as well off. Where Google, and the libraries, frankly, have broken the law is in giving Google copies of the scanned documents.


  Thomas Lord [08.03.06 10:36 AM]

The comment about Google's "second copy" may have been quite astute, not ridiculous as you portray it.

Is it not the case that Google scans the books, gives the library a copy of the scanned text, and "takes home" a complete copy? And is it not the case that the purpose of that second copy, the one Google takes home, is to have a distribution from the library's collection for the purpose of a direct or indirect commercial advantage to Google?

Precisely that scenario is explicitly forbidden by 17 USC �108 ("Limitations on exclusive rights: Reproduction by libraries and archives") and you'll find little in 17 USC �107 ("[...] fair use") on which to hang Google's hat. Indeed, most rhetoric I've seen them seems to appeal to �107 (4) in which fair use factors should include "the effect of the use upon the potential market for or value of the copyrighted work". Google has argued "Well, if anything, we'll probably drive the value up, on-average, so what's the problem?" The problem is that that wasn't Google's decision to make.

I believe Google's behavior -- and even more the behavior of the libraries -- to be shockingly egregious violations of the law. And yes, I have a pretty good idea how search engines work.


Predicted outcome: Google will settle with major publishers for some $X per title scanned. Smaller publishers will wind up with a choice of opting-out from Google's indexes or taking the same $X. Other search services will reach similar deals. The major publishers will make full-text of many of these works available for paid download and on-demand printing. So, ultimately, if my prediction comes true, Google gets some high marks for solving a social problem that nobody has been able to solve for 20+ years and we are, indeed, all better off. But let's not gloss over the fact that the route from point A to point B involved breaking the law or that Google, in fact, owe's a decent chunk of their pocket change to a lot of copyright holders. (And, the price per copy Google will have to settle for isn't necessarily nothing. In court, the libraries could be fined, Google ordered to destroy all copies, legal fees and damages ordered to be paid. The libraries would get to keep their copies and the major publishers would still wind up authorizing single sales to indexing companies for $(X+1) per copy.) Why spend much to lobby when you can just steamroller over the law, if you have enough money? :-)

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.