Google Library vs. Publishers

I’ve been having an interesting debate with Lauren Weinstein over on Dave Farber’s IP mailing list about the controversy over Google Libary and the ethics of Google’s position that scanning library book collections in order to create a search index is fair use. Google offers to let publishers opt out; publishers still cry foul, saying the program ought to be opt in. Weinstein thinks Google is out of line; I defend Google’s approach, arguing that this is another case where old line publishers are being dragged kicking and screaming towards a future that is actually going to be good for them.

 

Since this is a debate that’s worth having, and shouldn’t be limited to a single mailing list, I’m reproducing my postings here. Lauren asked me not to reproduce his postings in full, but I think it’s fair use to repeat the bits from Lauren that I quoted in my replies to hom on the mailing list.

In the event that you want to read the full text of the original postings on IP:

Lauren: Google Suspends Scanning Copyrighted Works — For Now

Tim: More on Google Suspends Scanning Copyrighted Works — For Now

Lauren: Google Print and Ethics

Tim: More on Google Print and Ethics

Marty Lyons then posted on the ethics of building a personal digital collection of books that one has bought. Karl Auerbach wrote in with some thoughts that search engines ought to be thinking through a provision to compensate authors directly for any money made off their writings — not just books, but anything written on the web. (Note to Karl — I think they’ve already done that. It’s called Google AdSense.)

Meanwhile, on his blog, John Battelle independently weighs in on Google’s side of the debate: “All I can say is – let’s work this out, folks. This ain’t Napster. I know the book industry has issues with this, and they are significant, but man, they are completely shooting themselves in the foot if they don’t figure out how to leverage Google and search in general to sell down the long tail. Sheesh.”

Here’s the full text of my two long postings to IP, with a few formatting cleanups for web rendering (my first quotation from Lauren’s original message is linked to separately; the shorter quotes from his second posting are interspersed with my reply):

Lauren Weinstein wrote:

http://www.nytimes.com/aponline/technology/AP-Google-Library-Copyrights.html

However, demonstrating that Google still doesn’t really “get it,” the
article notes that:

Google wants publishers to notify the company which copyrighted
books they don’t want scanned, effectively requiring the industry
to opt out of the program instead of opting in. … ”Google’s
procedure shifts the responsibility for preventing infringement
to the copyright owner rather than the user, turning every
principle of copyright law on its ear.” …

I’m all in favor of reasonable copyright laws that don’t extend
copyrights so far into the future that important works are kept out
of the public domain seemingly forever, but Google’s project, as
relates to copyrighted works, definitely has been beyond the pale.

I replied:

Dave, I am on Google’s publisher advisory board for Google Print, and while the conversations in the room at the last advisory board meeting, where these changes were discussed, were confidential, I think it’s OK to report my own feelings on the matter (and that I found myself quite at odds with most of the other publishers on this issue.)

 

It seems to me that Google’s position, that scanning the documents in order to provide a service that allows potential readers to find which books contain the information they are seeking is indeed fair use, is a defensible position. The fact that such a service has huge potential value to google is beside the point. Google is creating, at considerable expense, a collective work that enables users to search books in new ways. The information they provide in the form of snippets, analogous to the snippets they show in search results for web pages, would certainly be considered fair use, if, for example, I were to create and circulate a reading list of my favorite books, including suggestive snippets. The fact that they are creating it algorithmically and on demand doesn’t change that dynamic, in my mind.

Nor are they obtaining the books that they scan in an unauthorized way. The libraries have bought and paid for those books. They would be within their rights to scan the books and make an internal copy. Google is doing this for them, but again, I don’t see this as an unfair use. The same people who think it’s illegitimate would also argue that it’s unfair use for a user to rip a copy of a CD to his or her hard disk.

Let me take this out of the realm of copyright law for a moment, and ask about which side in this debate is going to provide benefit to both authors and readers. Is it google, or is it the publishers?

Even if I’m wrong about the legal issue (because, after all, I’m not a lawyer), I believe that Google (along with Amazon with their Search Inside, as well as more specialized services like O’Reilly’s own Safari Books Online service) are exploring new business models for publishing online. I will lay pretty strong odds that those publishers who are whining now about the illegitimacy of what Google is doing will be desperately trying to play catch up once new models become established.

Publishers have been stalling for years in getting their content online. Now someone may have a model that will take us in new directions, and they want to stop it till they can figure out how they will be the ones to profit from it.

It’s clear that we’re entering a brave new world when it comes to digital versions of books. But what we should have learned from the music industry brouhaha is that punishing the pioneers (even if, to quote Shakespeare, they let “the hot blood leap over the cold decree”) is simply a recipe for delay, and typically transfers value from the first mover to the second (think Napster to iTunes), while the complaining, delaying parties are still too late to the party to profit as much as they would if they got on board.

I’m excited about the potential of Google Print to drive both print sales and pay per view access to online content. Google is out there trying to build publishers a new business model. Once the service is in place and fully deployed, there will be huge opportunities for publishers.

Lauren then replied with a long posting about what he considered to be the ethics of the situation. I replied:

Gosh, I’m a publisher, and I see the ethics very differently. Here’s the way publishing actually works:

  • Author labors for a year or years to produce a work, often in the hope that he or she will “win the lottery” and have a bestseller. Publisher effectively gets its product for less than the cost of production (except in the case of the bestselling authors at the top of the heap, who get overpaid for their efforts, like most superstars.) Other authors do it for the reputation, or the readership, but whatever the reason, publishers don’t really pay very much for the IP that they “own.”

  • Publisher throws the product into the market and sees what sticks. Most books are never promoted, never reprinted. Author didn’t win the lottery. (Many years ago, the Science Fiction Writers of America audit committee did an amazing writeup in the SFWA bulletin about the economics of science fiction publishing. Boiled down to a nutshell, what they discovered was this: that publishers calculated their advances to authors on a first print run and an expected return rate (50% in the case of mass market science fiction). If the book did as expected, the author is out of luck, because the publisher only would continue to support the book if the return rate was less than expected. In short, it’s a “house always wins, player almost always loses, but enough people win big to keep the suckers coming back” kind of business.) (I note that not all publishers operate like this — including O’Reilly! — but there’s enough truth to it as an industry pattern that it begs the ethics question.)

  • Publishers do pay for the cost of printing and the risk of returns, and a lot of operational cost, so this business model is the result of a lot of economic realities. This is not typically a “rich” industry, and there are a lot of publishers who, like authors, do it as a labor of love. Nonetheless, once the costs have been sunk, and the experiment run, the “long tail” of publishing is left to trail away on its own, without a lot of continued promotion and attention.

  • Along comes a player who says “I have a way to promote those books that the publishers have thrown away, creating an opportunity for them to find readers, and eventually, sales.” The publishers complain, because they are worried that someone else is going to make money from their slag heap, or more likely, because they are worried that there’s some downside risk to their top sellers, even if there’s a lot of benefit to the bottom and mid-list books. This is the same situation I wrote about back in 2001 in my essay Piracy is Progressive Taxation

I find the argument to ethics on the other side unconvincing. You can cast it how you want. Google isn’t “borrowing” the books from libraries. They are partnering with libraries to do something that is very much in line with the mission of libraries, which is to store and share human knowledge. As to whether the reaction would be the same if Microsoft, and not Google, had done it: I suspect it would indeed be the same. Publishers would be complaining, and I would be applauding. You say:

Google made essentially a “sweetheart” deal with libraries that benefits Google vastly and also benefits the libraries, but pays not a dime to the copyright holders.

Yes, and what’s wrong with that? If the libraries had done it themselves, would the copyright holders have any grounds to complain? If they then shared their scanned copies for the limited purposes of making a super card catalog (which is what Google’s Library service provides), would the copyright holders have the right to complain? That’s essentially what’s happening, except that Google is facilitating the effort.

 

If Google were offering the full Google Print style service, where they were actually showing full pages from the copyrighted work, I’d completely agree with you. But they are scanning the books in order to provide search, and showing only snippets that would indeed be completely fair use if the catalog were created manually. It’s less than is quoted in any book review.

You say:

This is all yet another example of an extremely worrisome sensibility
in some segments of the Internet world — that somehow the virtual
world of the Internet exists (or should exist) outside and apart
from the rules of law and concepts of ethics that have long guided
us in the physical world. It’s obvious that laws must change and
evolve faster to keep pace with the rapid rate of technological
change — many of today’s technology-related problems are the result
of just such a lag. But basic ethics should *not* be degraded in the
Internet world, simply by virtue of the facts that servers in
data centers and billionaire-based “coolness” are involved.

I couldn’t disagree more. Law is always dynamic, and the way that it catches up with reality is through people pushing the boundaries. (See Lessig’s Code and Other Laws of Cyberspace for some great accounts on this front.) And the reality that it tends to catch up with are the prevailing ethics of a society. And the ethics of copyright, to me, are to benefit the author and the reader, and to incentivize investment in the “progress of science and the useful arts.” (I know I’m borrowing from patent language here, but the same principle applies.)