Google Library vs. Publishers

Sat

Aug 13
2005

listen

Google Library vs. Publishers

I've been having an interesting debate with Lauren Weinstein over on Dave Farber's IP mailing list about the controversy over Google Libary and the ethics of Google's position that scanning library book collections in order to create a search index is fair use. Google offers to let publishers opt out; publishers still cry foul, saying the program ought to be opt in. Weinstein thinks Google is out of line; I defend Google's approach, arguing that this is another case where old line publishers are being dragged kicking and screaming towards a future that is actually going to be good for them.

Since this is a debate that's worth having, and shouldn't be limited to a single mailing list, I'm reproducing my postings here. Lauren asked me not to reproduce his postings in full, but I think it's fair use to repeat the bits from Lauren that I quoted in my replies to hom on the mailing list.

In the event that you want to read the full text of the original postings on IP:

Lauren: Google Suspends Scanning Copyrighted Works -- For Now
Tim: More on Google Suspends Scanning Copyrighted Works -- For Now
Lauren: Google Print and Ethics
Tim: More on Google Print and Ethics

Marty Lyons then posted on the ethics of building a personal digital collection of books that one has bought. Karl Auerbach wrote in with some thoughts that search engines ought to be thinking through a provision to compensate authors directly for any money made off their writings -- not just books, but anything written on the web. (Note to Karl -- I think they've already done that. It's called Google AdSense.)

Meanwhile, on his blog, John Battelle independently weighs in on Google's side of the debate: "All I can say is - let's work this out, folks. This ain't Napster. I know the book industry has issues with this, and they are significant, but man, they are completely shooting themselves in the foot if they don't figure out how to leverage Google and search in general to sell down the long tail. Sheesh."

Here's the full text of my two long postings to IP, with a few formatting cleanups for web rendering (my first quotation from Lauren's original message is linked to separately; the shorter quotes from his second posting are interspersed with my reply):

Lauren Weinstein wrote:

http://www.nytimes.com/aponline/technology/AP-Google-Library-Copyrights.html
However, demonstrating that Google still doesn't really "get it," the article notes that:
Google wants publishers to notify the company which copyrighted books they don't want scanned, effectively requiring the industry to opt out of the program instead of opting in. ... ''Google's procedure shifts the responsibility for preventing infringement to the copyright owner rather than the user, turning every principle of copyright law on its ear.'' ...
I'm all in favor of reasonable copyright laws that don't extend copyrights so far into the future that important works are kept out of the public domain seemingly forever, but Google's project, as relates to copyrighted works, definitely has been beyond the pale.

I replied:

Dave, I am on Google's publisher advisory board for Google Print, and while the conversations in the room at the last advisory board meeting, where these changes were discussed, were confidential, I think it's OK to report my own feelings on the matter (and that I found myself quite at odds with most of the other publishers on this issue.)

It seems to me that Google's position, that scanning the documents in order to provide a service that allows potential readers to find which books contain the information they are seeking is indeed fair use, is a defensible position. The fact that such a service has huge potential value to google is beside the point. Google is creating, at considerable expense, a collective work that enables users to search books in new ways. The information they provide in the form of snippets, analogous to the snippets they show in search results for web pages, would certainly be considered fair use, if, for example, I were to create and circulate a reading list of my favorite books, including suggestive snippets. The fact that they are creating it algorithmically and on demand doesn't change that dynamic, in my mind.
Nor are they obtaining the books that they scan in an unauthorized way. The libraries have bought and paid for those books. They would be within their rights to scan the books and make an internal copy. Google is doing this for them, but again, I don't see this as an unfair use. The same people who think it's illegitimate would also argue that it's unfair use for a user to rip a copy of a CD to his or her hard disk.
Let me take this out of the realm of copyright law for a moment, and ask about which side in this debate is going to provide benefit to both authors and readers. Is it google, or is it the publishers?
Even if I'm wrong about the legal issue (because, after all, I'm not a lawyer), I believe that Google (along with Amazon with their Search Inside, as well as more specialized services like O'Reilly's own Safari Books Online service) are exploring new business models for publishing online. I will lay pretty strong odds that those publishers who are whining now about the illegitimacy of what Google is doing will be desperately trying to play catch up once new models become established.
Publishers have been stalling for years in getting their content online. Now someone may have a model that will take us in new directions, and they want to stop it till they can figure out how they will be the ones to profit from it.
It's clear that we're entering a brave new world when it comes to digital versions of books. But what we should have learned from the music industry brouhaha is that punishing the pioneers (even if, to quote Shakespeare, they let "the hot blood leap over the cold decree") is simply a recipe for delay, and typically transfers value from the first mover to the second (think Napster to iTunes), while the complaining, delaying parties are still too late to the party to profit as much as they would if they got on board.
I'm excited about the potential of Google Print to drive both print sales and pay per view access to online content. Google is out there trying to build publishers a new business model. Once the service is in place and fully deployed, there will be huge opportunities for publishers.

Lauren then replied with a long posting about what he considered to be the ethics of the situation. I replied:

Gosh, I'm a publisher, and I see the ethics very differently. Here's the way publishing actually works:

Author labors for a year or years to produce a work, often in the hope that he or she will "win the lottery" and have a bestseller. Publisher effectively gets its product for less than the cost of production (except in the case of the bestselling authors at the top of the heap, who get overpaid for their efforts, like most superstars.) Other authors do it for the reputation, or the readership, but whatever the reason, publishers don't really pay very much for the IP that they "own."
Publisher throws the product into the market and sees what sticks. Most books are never promoted, never reprinted. Author didn't win the lottery. (Many years ago, the Science Fiction Writers of America audit committee did an amazing writeup in the SFWA bulletin about the economics of science fiction publishing. Boiled down to a nutshell, what they discovered was this: that publishers calculated their advances to authors on a first print run and an expected return rate (50% in the case of mass market science fiction). If the book did as expected, the author is out of luck, because the publisher only would continue to support the book if the return rate was less than expected. In short, it's a "house always wins, player almost always loses, but enough people win big to keep the suckers coming back" kind of business.) (I note that not all publishers operate like this -- including O'Reilly! -- but there's enough truth to it as an industry pattern that it begs the ethics question.)
Publishers do pay for the cost of printing and the risk of returns, and a lot of operational cost, so this business model is the result of a lot of economic realities. This is not typically a "rich" industry, and there are a lot of publishers who, like authors, do it as a labor of love. Nonetheless, once the costs have been sunk, and the experiment run, the "long tail" of publishing is left to trail away on its own, without a lot of continued promotion and attention.
Along comes a player who says "I have a way to promote those books that the publishers have thrown away, creating an opportunity for them to find readers, and eventually, sales." The publishers complain, because they are worried that someone else is going to make money from their slag heap, or more likely, because they are worried that there's some downside risk to their top sellers, even if there's a lot of benefit to the bottom and mid-list books. This is the same situation I wrote about back in 2001 in my essay Piracy is Progressive Taxation
I find the argument to ethics on the other side unconvincing. You can cast it how you want. Google isn't "borrowing" the books from libraries. They are partnering with libraries to do something that is very much in line with the mission of libraries, which is to store and share human knowledge. As to whether the reaction would be the same if Microsoft, and not Google, had done it: I suspect it would indeed be the same. Publishers would be complaining, and I would be applauding. You say:
Google made essentially a "sweetheart" deal with libraries that benefits Google vastly and also benefits the libraries, but pays not a dime to the copyright holders.
Yes, and what's wrong with that? If the libraries had done it themselves, would the copyright holders have any grounds to complain? If they then shared their scanned copies for the limited purposes of making a super card catalog (which is what Google's Library service provides), would the copyright holders have the right to complain? That's essentially what's happening, except that Google is facilitating the effort.

If Google were offering the full Google Print style service, where they were actually showing full pages from the copyrighted work, I'd completely agree with you. But they are scanning the books in order to provide search, and showing only snippets that would indeed be completely fair use if the catalog were created manually. It's less than is quoted in any book review.
You say:
This is all yet another example of an extremely worrisome sensibility in some segments of the Internet world -- that somehow the virtual world of the Internet exists (or should exist) outside and apart from the rules of law and concepts of ethics that have long guided us in the physical world. It's obvious that laws must change and evolve faster to keep pace with the rapid rate of technological change -- many of today's technology-related problems are the result of just such a lag. But basic ethics should *not* be degraded in the Internet world, simply by virtue of the facts that servers in data centers and billionaire-based "coolness" are involved.
I couldn't disagree more. Law is always dynamic, and the way that it catches up with reality is through people pushing the boundaries. (See Lessig's Code and Other Laws of Cyberspace for some great accounts on this front.) And the reality that it tends to catch up with are the prevailing ethics of a society. And the ethics of copyright, to me, are to benefit the author and the reader, and to incentivize investment in the "progress of science and the useful arts." (I know I'm borrowing from patent language here, but the same principle applies.)

tags: | comments: 34 | Sphere It
submit:

Previous | Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/4238

Comments: 34

Ross Stapleton-Gray [08.13.05 04:31 PM]

But I think you're as likely to see future controversies arise, as "Google Print hacks" emerge, just as we've seen issues percolate up (as were kicked around at your recent Where 2.0 conference) when the map services find all sorts of developers downstream wrangling their data around into complex balloon animals unenvisioned when they first started licensing data in batch to major sites. What happens when developers extract value from manipulated books, that never needs lead all the way to a request for the full text, much less the actual purchase of a book?

Another thought: Google may be doing a fine thing, but it may in part be rushing to create that critical mass of searchable content, via Google alone, to pre-empt creation of more universal "inside the book" search standards. What if there were a universally-recognizable means for publishers to deliver to any search service a "book object," with all the necessary hooks for indexing, information hiding, etc.? (Not that I really think that "the publishing industry," per se, was on the verge of any such thing, but it's an interesting prospective alternative development path.)

Hashim [08.13.05 06:49 PM]

"Obscurity is a far greater threat to authors and creative artists than piracy"

[Note from Tim: since you didn't say where this came from, I thought I'd add that it's a quote from my "Piracy is Progressive Taxation" piece referrred to above.]

Mike Perry [08.13.05 07:33 PM]

Just yesterday, I opted in, placing all of Inkling Books titles into Google Print. Since the main problem small publishers face is getting visibility, I like the idea, particularly since my printer takes care of providing the PDF files to Google, which means there's zero labor and hassle at my end. But I do have problems with what Google is doing.

First, this isn't an index to a book, which probably would be fair use. It's a full copy of the book made available to anyone without the copyright holder's permission. And this isn't one owned copy being loaned out by a library to one person at a time. Given the size of Google's server farm, a million people could be reading that "one copy" at the same time. Without the copyright holder's permission, that most emphatically is a copyright violation. That is perhaps why, despite their deep pockets for lawyers, Google is backing down. Google should only include books when the copyright holder opts in. In the long run, that'll be simpler and cheaper for everyone.

Second, the mailing I received said, "Prospects can browse through a few pages, just as they could flip through the pages of a book in a bookstore." Yet according to an obscure remark on Google's own web pages, those few pages are actually 20-100% of the book (set by the publisher, though I believe the default is 100%). For a 200 page book, that means the very least an author/publisher can limit a single viewer to is 40 pages. Sorry, but that isn't a "few pages."

An author/publisher needs to be able to set the limit as low as 5% in addition to the Front matter and TOC (with separately setable minimum and maximum numbers). That's still 10 pages, more than most people would read in a bookstore and more than enough to get a feel for the book's contents. And in that I assume, as do all reasonable people, that Google isn't going to be able to keep people from grabbing the text by some technique.

Third, arguments that this is best for authors and publishers are beside the point. That's for them to decide, not Google and not anyone else. A copyright means precisely that, it is the right to control the copying. Even if not joining Google is in every situation stupid, an unlikely situation, people have a right to be stupid.

Fourth, I agree that publishers often shaft authors. That's why I got into one-Mac publishing and now have 26 titles in print. I wanted to publish the books I wanted without the hassle of dealing without someone having different interests from my own. But if authors want or don't want to be listed in Google Print, they need to work that into their contract with their publisher. Google can't simply decide that because some authors want to be included in spite of their publishers, all authors and publishers must be included whatever the wishes of either. That's not their call to make.

Fifth, Google is forgetting that rights to publish often have geographic limits. Willing or not, a U.S. publisher with only North American rights can't agree to have Google release the book to the entire world. That's precisely why Apple is having to limit sales and negotiate rights for each country in which they have a store. It may be messy, but it's the law and Google is no more above the law than is Apple.

Finally, if Google would like to have more authors and publishers "opt in," why don't they offer them something in exchange. Publishers need a one-stop location to post information about their books that can be taken up by anyone--online bookstores, libraries etc. Passing that sort of information to Amazon, to BarnesandNoble.com, to Buy.com, etc., etc. etc. is a real hassle. It'd be great if Google offered one place where any publisher with an ISBN could post details about a book than anyone could download and use. Libraries, for instance, need to make their on-line catalogs as descriptive as those on Amazon. And a part of posting that information would include a field where they could authorize Google Print to include the book.

In short, I like what Google's doing and particularly the chance they're offering for search hits to send some money in my direction. But they need to step back and respect the rights of copyright holders and they need to offer more options to authors and publishers.

--Mike Perry, Inkling Books, Seattle, Author Untangling Tolkien

Ross Stapleton-Gray [08.13.05 07:48 PM]

Your comment that, "It'd be great if Google offered one place where any publisher with an ISBN could post details about a book than anyone could download and use," really grabs me, because it's a problem that's theoretically solved (but currently, only theoretically) by RFID, of all things: in order to make RFID work, EPCglobal, the consortium that owns the RFID and Electronic Product Code (EPC) standards, has created something called the Object Name Service (ONS), exactly analogous to the Domain Name Service (DNS). And through it, one could take any ISBN and (if the ISBN owners wanted to make it so) get a pointer to a destination at which the data you describe could be found.

In the above scheme, Inkling Books would inform the ONS, "information on our books is found at [URL]." And then Inkling, or whoever it assigned to handle the job, would make the information available in a Web services approach.

Right now, Amazon.com is something of a proxy for "tell me about this book." Google could move to do that. But even better, I think, would be to disintermediate the whole thing.

Discussion of some of these issues in a white paper I wrote for CommerceNet last year, on what the ONS means as a fulcrum for Internet commerce: http://www.stapleton-gray.com/papers/CN-TR-04-06.pdf

Tim O'Reilly [08.13.05 08:07 PM]

Mike, regarding the amount of content that is displayed, I think you're confusing the standard Google Print program with the Google library program. Google Print (where the publishers opt in) provides page images of a percentage of the book (which is in fact agreed to by the publisher.) In the Google library program, where no permission has been obtained. I believe that they just show the snippets, and you can't click through to pages, unless the book is actually out of copyright, in which case, they'll show it all.

If I'm wrong about this, someone from Google should please correct me, because then I might feel differently about the fair use issue.

Robert Nagle [08.13.05 08:52 PM]

This may be a side issue to the main issue, but I frequently try to research old titles that might be public domain, or might not.

One terribly confusing thing is that publishers frequently release "new editions" with trivial post-1922 changes/additions to pre-1922 works. It is often a bear trying to figure out on the basis of WorldCat entries and commercial services like Amazon.com what the actual copyright registration date is for many works at the 1922 border. It often is time-consuming to figure out which editions are "definitive" or "incomplete."

As an example of what I mean, try to figure out the registration date of William Gerhardie's Futility, a great 1920's work. There are "editions" from 1991 and 1974 of the work which was first published in 1922. Through interlibrary loan and supplemental research, I will probably be able to figure which edition is the authoritative edition. But it takes a lot of time. I hope that one ancillary benefit of a google/library partnership is to make it easier to access such "metadata" or to compare editions of the same work. That would be a big win for consumers, google and libraries and project gutenburg.

Erik Sherman [08.14.05 02:57 AM]

There are a few basic problems with your argument in my opinion. One is that you want to take the question out of the realm of copyright law, but you can't, because that is the international body of agreemnents and ground rules governing how one person can use another's work. Fair use, at least in the United States, is well defined as the ability to quote small parts of copyrighted written material to allow critical discussion in, say, an academic or journalistic setting. To say that fair use includes making a mammouth scanned searchable index is simply not something considered or allowed under the law. If people want to change the approach, then the proper way of doing that is through the legislative process, and not by trying to redefine the term "fair use" into something one wants it to be.

Maybe old line publishers are "being dragged kicking and screaming into a future towards a future that actually is going to be good for them." But what gives Google, or any other company, the right to make that decision and force others to accept it? If the idea is so good, then offer it and let publishers opt in. Requiring them to opt out dismisses a wide variety of priciples governing our society, from philosophical and religious concepts of free will to economic theories of capitalism and competition. Perhaps those who don't want to participate are making a big mistake, but that should be their mistake to make.

You write, "The libraries have bought and paid for those books. They would be within their rights to scan the books and make an internal copy. Google is doing this for them, but again, I don't see this as an unfair use. The same people who think it's illegitimate would also argue that it's unfair use for a user to rip a copy of a CD to his or her hard disk." But that isn't accurate. Google isn't making scans for the libraries; they're scanning the books for their own future business purposes. The use isn't for internal back-ups, but for creating global access. It's easy to say that someone should be able to rip a personal copy of a CD to a hard drive - that's perfectly legal. But the courts have held that when someone opens that collection to virtually anyone on the Web, it's illegal because it's effectively republishing the material.

Finally, this is an issue of what the writers want as much as what the publishers want - even more so. Generally the writers own the content and license it to the publisher. For the publisher to decide unilaterally to allow a radically different use of material from what has ever been discussed between the publisher and writer is to at least break the bond of trust that must be the foundation for a good business relationship, aside from the legal responsibilities.

Tim O'Reilly [08.14.05 08:56 AM]

Erik --

While it's true that the original intent of "fair use" was for critical discussion, I think that in common practice, the principle has extended far beyond that. For example, there are many catalogs intended to sell books that quote from said books for the purposes of selling them, not for critical discussion.

But more to the point, your arguments apply to the web itself. If what Google is doing with libraries, namely to build an index of other people's content, and to present a snippet of that content in the context of a set of search results, is not fair use, then neither is what they do on the web. If web crawling were also opt in, so much that we take for granted would be impossible. Instead, we've made it opt out (via robots.txt), and it's worked fabulously.

A further complication: you say that writers own the content and license to publishers. True in principle, but far from that in practice. Most rights are so tangled and obscure that any "opt in" policy is a recipe for stasis. Book publishers are in the same position as music publishers, where "clearing rights" is so complex and expensive that no one is going to bother until there's a compelling economic incentive. Therefore, Google's opt out is the only practical alternative.

Tim O'Reilly [08.14.05 08:59 AM]

I wanted to share some email from Andrew Bridges, from San Francisco legal firm Winston and Strawn, who wrote:

"Saw your post to IP. Have no fear: it's not just "patent" language. It's from the constitutional clause that applies to both patents and copyrights, so there's no need to argue from analogy. Great comment.

"(We met at your Nov. 2001 P2P conference back when I was defending MusicCity/Streamcast in the Grokster case. I have always cherished your question to Hilary Rosen, and her response. You asked: 'How do you account for the fact that 99% of recording artists fail from an economic standpoint?' She responded: 'There is just too much music and there are too many recording artists for the current methods of distribution.' (Paraphrase from memory.) She got it right, but I think she focused too much on the problem in the first half of the last sentence rather than the problem in the last half of the sentence!)"

Tim O'Reilly [08.14.05 09:16 AM]

Another really great post on IP, from John Levine, following up on a comment by Karl Auerbach:

Karl wrote: "But to change the subject slightly - I've become concerned with how search engine companies are making a buck off of web-based works without letting the authors share in the wealth."

John replied:

"Yup, that's how it's supposed to work.

"The point of copyright is not to guarantee authors a cut every time someone looks at a word they wrote. It's to get material into the hands of the public. The limited rights granted by copyright are the carrot to persuade authors to publish rather than selling individually licensed copies of their work. The leakage due to first sale and fair use is an integral part of copyright.

People have been making directories of various sorts for centuries, long before there were computers. This isn't a new question, and the answer hasn't changed just because it's become more automated. (The only major change I can think of in recent years is the Feist decision which says that originality as well as effort are required to make something copyrightable.)

I have seen occasional sabers rattled with claims like this, but none of them has gone anywhere."

Tim O'Reilly [08.14.05 09:18 AM]

Another email comment, this time from George Dyson, reposted by permission:

"Your post to IP about the digital library was great. A personal comment (nothing new, but worth repeating):

"As a Canadian author, one of the highlights of my year is the day in March when I receive a cheque from the PLR. The Public Lending Rights Commission, funded by the Government of Canada (CDN$9.449 million in 2004-2005) and administered by the Canada Council, was established in 1986 on the premise that authors are a national resource and should be compensated in some way for the fact that public libraries lend out copies of their books. The process works as follows (but given electronic records could be more fine-grained to reflect whether anyone actually checks out or downloads your work). Every year the PLR selects 10 Canadian libraries, at random. For every library that has one of your titles, you get 1 point. All the money is divided by all the points, and your cheque is in the mail. In 1986 each point was worth about $40.00. Last year it was $34.45. The average payment was $663, and 12,148 authors with a total of 45,655 titles shared $8,052,114. There is a 100-point maximum, so if you are Farley Mowat and all libraries have all your books, you get $3,445 and that's it. This system works, is administrated with a total staff of only 8 people, and to adapt this model to the Google library (with a larger endowment) would change everything, I think."

In his response to my query about reprinting, George added a little more detail:

"fine to quote me. Actually I was responding (in a different direction from Lauren Weinstein) to your question

'about which side in this debate is going to provide benefit to both authors and readers. Is it google, or is it the publishers?'

by pointing to PLR as an existing model of an alternate approach. Libraries (and probably Google) are providing access within fair use and don't legally owe anyone anything. PLR stepped in (just to make fair use among Canadians a little more fair) and started paying *authors* (not publishers) directly. If a consortium including Google subscribed to a similar model (since who knows when the US Gov't will get around to something like PLR) this would shift the field of the debate. Canada is well-known for having social medicine, less known for having social copyright to this limited extent."

K,G. Schneider [08.14.05 09:21 AM]

As a librarian, published author, and digital library manager, my concern about Google Print comes from a different direction. Moving books online is inevitable and in the long run highly salutary, and good cost models could emerge that will allow creators and users to benefit. I say "could emerge" because what concerns me is that Google, a private company, is leading this project. Yes, I know, they got there first, and libraries don't have the collective resources or apparently the digital wherewithal to take on such an ambitious project. But it bothers me that in so much of this discussion, Google's index is assumed to be unquestionably a public good managed in the public interest. But Google is a commercial enterprise, and through Google Print this company will have monopolist control of information in the online domain. It may be inevitable, but that doesn't make it right.

Tim O'Reilly [08.14.05 09:24 AM]

K.G. -- I agree that there are risks in having Google (or any private company) own such a resource, but they are also the ones making the investment to make it happen. What's more, given that there's competition from Amazon and others, and if this is a lucrative business, others could do the same thing, I don't see it as a monopoly dead-end.

Tim O'Reilly [08.14.05 09:28 AM]

Another email followup from George Dyson:

and in case you haven't blogged H.G. Wells recently, here is the gist of his adress to the world congress of librarians in 1937, and his proposal for a global encyclopedia, that was later published in his 1938 book WORLD BRAIN:

"We want... a universal organization and clarification of knowledge and ideas... what I have here called a World Brain, operating by an enhanced educational system through the whole body of mankind... a widespread world intelligence conscious of itself...

"The phrase "Permanent World Encyclopaedia" conveys the gist of these ideas. As the core of such an institution would be a world synthesis of bibliography and documentation with the indexed archives of the world. A great number of workers would be engaged perpetually in perfecting this index of human knowledge and keeping it up to date...

"Few people as yet, outside the world of expert librarians and museum curators and so forth, know how manageable well-ordered facts can be made, however multitudinous, and how swiftly and completely even the rarest visions and the most recondite matters can be recalled, once they have been put in place in a well-ordered scheme of reference and reproduction... There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements, to the creation, that is, of a complete planetary memory for all mankind....

"This... foreshadows a real intellectual unification of our race. The whole human memory can be, and probably in a short time will be, made accessible to every individual. And... this new all-human cerebrum need not be concentrated in any one single place. It need not be vulnerable as a human head or a human heart is vulnerable. It can be reproduced exactly and fully, in Peru, China, Iceland, Central Africa, or wherever else seems to afford an insurance against danger and interruption. It can have at once, the concentration of a craniate animal and the diffused vitality of an amoeba..."

Peter Brantley [08.14.05 11:30 AM]

Tim writes:

"If the libraries had done it themselves,
would the copyright holders have any grounds to complain?
If they then shared their scanned copies for the limited
purposes of making a super card catalog (which is what
Google's Library service provides), would the copyright
holders have the right to complain?"

I think the answer is actually, "Potentially, yes, there would be grounds for complaint." It is not clear at all that libraries would have the right to make a digital representation or copy of in-copyright works, much less to index them and enable consequent full-text search (as opposed to simply creating a dark archive of page-images of the content for preservation purposes, which would more conceivably fall into an acceptable interpretation of copyright and DMCA, in the digital realm). Certainly there is a potentially valid interpretation of fair-use through the display of strictly delimited snippets, but that delimitation is actually secondary to the essential accessibility of the material.

One of the things to keep in mind, I think, is that it is reasonable to suppose that if libraries were *not* involved in this, it would be more straightforward for publishers to work directly with google on economic and business models which would be mutually rewarding. However, having another digital copy "floating around" in a library - not an institution known for persistently taking the POV of Commerce - is a significant risk, from their perspective.

The publishers' concern is not just the threat of losing a few sales to faculty and students, but rather unfettered distribution; the indemnification clauses in the Google - Univ. of Michigan contract - the only contract publicly accessible - are clearly interesting to the publishers.

To quote from a BusinessWeek article in June, "The Michigan contract, says Adler [a spokeman for the American Association of Publishers], is not clear on how both parties will make these copies and respect copyright laws at the same time. It also states that Google will indemnify the university in the case of any copyright lawsuits -- except if distribution of the university's digital copy violates copyrights."

The issues here are more complex than whether publishers are comfortable making copyrighted content available for searching on the web. They involve elemental issues of property ownership and control. It is no wonder that publishers are infuriated with an approach that says, "Books are just like websites. Just tell us which ones you don't want scanned and we'll either not scan them, or maybe just not make them accessible. Don't worry."

There is room for dialogue, but Google has to respect the perspective of publishers; the publishers in turn must be comfortable with restrictions on the flow of the digital material that results. Ideally, this dialogue would incorporate a new perspective on the value of disaggregated content, but like many major transformations in the economics of property, that may take some time to develop.

Jeroen Wenting [08.15.05 03:54 AM]

If I were to do what Google is doing (taking books, scanning them, and putting the entire text online) I'd be dragged off to court immediately for plagiarism and copyright infringement and rightly so.
If Google feels they're above the law (as they seem to think ever more often) it's high time they're stopped in their tracks.

I've nothing against putting the text of books online but ONLY with explicit permission from the publisher and author (or either one if the other can no longer react because he is dead or out of business) OR when no more copyright rests on the book.

I wonder how Tim would feel if tomorrow the entire text of his catalogue would appear wholesale on Google for free, complete with links to O'Reilly.com to download the samples and errata.
It would put an end not just to your (supposedly) lucrative Safari service but also put a serious dent in your printed book business.

Tim O'Reilly [08.15.05 09:08 AM]

Jeroen --

I don't think you understand the program. I've already put all my books into Google Print. Google limits the amount of text that people can see in a book, and gives detailed reports on usage, so if I believe that people are just reading the book online, I can pull out at any time. (That's why we're in Google Print, but not Amazon Search Inside -- Google's contract lets publishers withdraw books at any time, while Amazon's doesn't.)

Meanwhile, in the library program, they are scanning books that publishers haven't given permission for, but they aren't showing even the percentage that they show in normal Google Print, but rather are just showing snippets, just like the web search engine.

And they aren't asking for rights because the publishers wouldn't know how to give them. My travel publishing company, Travelers Tales, which publishes anthologies of travel writing, buys lots of rights, and it's extremely difficult and time consuming, and half the time, the publisher doesn't even know if they actually have the rights to resell the material. Trust me. If we believe that a search engine for book content is a good thing, and we like Google's mission of "access to all the world's information", then the position they are taking is the ONLY one that will work. If they have to ask for permission, the job will be impossible, and we'll all be poorer for it.

Given that Google allows anyone to opt out as soon as they think their rights are being damaged, I don't see the problem. Google is going to make a big investment to create a new marketplace for content, and new business models. Authors and publishers are going to benefit.

The real issue at debate here, to be honest, is that publishers are afraid of being disintermediated. And more specifically, they don't like that Google is returning to the libraries an electronic copy of everything they scan. They want to be able to sell those electronic copies.

But even there they are being short sighted. After all, Google is working with a handful of libraries. If I were the publishers who didn't have electronic copies of their older work (and most don't), I'd be delighted to have Google give copies back to the libraries that gave access to their collections, but I'd be asking for my own electronic copy that I could sell to other libraries, and enjoining Google from giving it to anyone but the original set.

Mike Perry [08.15.05 01:47 PM]

Since I was the third posting in this considerable list, I might add some comments now that I've completed the sign-up for the publisher program that Google Print is offering. Hopefully, all I'll be discussing here falls into the publicly known provision of the confidentiality agreement. I really hate this lawyerly obsession with secrecy.

First, the program defaults to allowing users 20%, the minimum value they offer, rather than the 100% I said earlier. Since 20% is the maximum I can live with, that is fine for me. But publishers who have very pricey reference works are likely to feel otherwise, particularly if a book gets included without their permission. There needs to be an option to set the view percentage at least as small as 5% and perhaps even lower. There is no technological reason why Google can't set it as low as the copyright holder wants.

A few years ago, I ghosted about half of a book on presidential scandals for a major political press that first sold for $150. Do a search for a president's name in that book via Google Print, and anyone with an Internet connection could grab all the scandals of any President (except perhaps Clinton) without buying the book. The next month the 20% reinstates and they could grab all the scandals of another President. Since the book was to be sold as a reference work, exposing it via Google Print virtually negates much of its value.

When people use a reference book, they don't want the entire book, just a small slice. The same is true of travel books and cookbooks. It's quite wrong to assume that those sorts of publishers must opt-out to avoid being "robbed." Copyright law is their protection and copyright case law recognizes that 'fair use' is highly dependent on context.

Most important of all, Google needs to offer something in exchange for what publishers, whether wisely or foolishly, are regarding as a taking of something they own. They're simply going to have to shift to opt-in or someone is likely to sue and, if they win, create a dreadful mess.

Google needs to offer an online database that would allow publishers to maintain a description of their books. This isn't a substitute for the webpages that I maintain for my books, as someone suggested. It's a portal into a database that would supply that information as formatted data, including updates, to Amazon, BarnesandNoble.com etc, so publishers don't have to do separate uploads to each in their peculiar format. It'd be easily edited by publishers. (I have two Tolkien books that I normally market to Tolkien fans, but around Christmas, it'd be great to be able to change that marketing to friends of Tolkien fans.) It would be easily downloaded (as data) by online stores and libraries and, most important of all, not copyrighted. The data would be available for free to anyone who'd want to set up their system to handle the format.

There's a need for precisely that. Bowker, which maintains the ISBNs, is hopelessly behind the technological curve and charges more than the smaller online stores and public libraries can afford to pay for their data. Wholesalers like Ingram also charge for their data and are linked to a particular supply chain. The Library of Congress database lists all copyrighted books, but gives almost no description of them.

Google could offer that database for free to any publisher with ISBNs. It could work out arrangements to supply the data publishers place there to anyone who's interested for free. Google might even permit ordinary Internet users to go there to view part of the content of a book and see links to places where they can buy the book. In return, Google would have a page where they could place ads and, when data is entered by publishers, they'd have a check box where the publisher could authorize a book's inclusion in Google Print. Heck, I wouldn't care if that check box defaulted to opt-in.

Unless they're fools, publishers would grab at this a one-stop place to feed data about their book to virtually anyone who wants it. And in return Google would get an opt-in procedure so easy to use, it would get as high a sign-up rate as they're going to get by any means. And without sign up and opt-in Google is running the risk of losing in court and losing big, wrecking what is an excellent idea for almost everyone.

--Mike Perry, Inkling Books, Seattle

Mike Perry [08.15.05 02:36 PM]

Even better, since Google is OCRing the text with what I assume is sophisticated software, they could offer to supply this text to the original author/publisher, who could republish it either in facsimile or as a newly typeset text. Then the Google Print link would become a new source of income.

And this emphasizes my point--stated twice above--that Google needs to engage in a bit more horse trading and behave less like a technological steamroller. "Opt in and we'll give you an electronic copy you can republish as an ebook or on paper," and they'll get the attention and respect of publishers, particularly in the financially straited academic press.

And yes, there is a problem with tracing down who now owns the rights to older books that would make that messy. For some time I've been trying to convince the circle around Lawrence Lessig that the real crime of our current copyright law is that we can be punished for violating copyrights held by someone whose ownership is often impossible to trace and thus whose permission cannot be obtained. To the extent that copyright is limited-term ownership, there ought to be a scheme that tracks the current owner much like local governments track land ownership and probate it when someone dies. If the government is going to punish copyright violation when a copyright is registered, then the government has a responsibility to track that ownership until the copyright expires. And if a copyright holder fails to maintain that who is the owner data, there ought to be a procedure that would drop the work into the publich domain.

That, I believe, is better that Lessig's scheme to require regular renewals for some token sum. Law professors don't find legal paperwork intimidating. Elderly widows do and often fail to act, even though they've lived at the same address for fifty years. Tracing the author is what matters, not token fees with confusing paperwork. The latter only makes retaining a copyright difficult for those who can't afford to keep a lawyer on retainer.

--Mike Perry, Inkling Books, Seattle

P.S. And the record keeping at publishers can be dreadful. I contacted one British publisher about a book that had been a bestseller in the UK circa 1940. They replied that the only way they knew they'd published it, was that they had a copy in their library. They'd lost all records and files. Since this book was published anonymously to conceal the author, who had friends in Nazi Germany, they could not even tell me who wrote it. Fortunately, I was able to discover the author through the book's Swiss-German publisher, where he was well-known as the editor of Karl Barth's works.

Kevin Farnham [08.15.05 06:47 PM]

Google troubles me lately. Their “genius” begins to remind me of the genius (including a Nobel Prize winner) of Long-Term Capital Management (LTCM), the hedge fund that nearly caused a global economic meltdown when they applied their “genius” (based on a mere 5 years of historical data) to international currency derivatives market trading…

The consequences of some Google “projects” don’t seem fully thought out. Nonetheless, awash with enormous amounts of cash, all potential projects within the company are seemingly very well funded.

People don’t want emails to be read by Google and transformed into reconstituted documents spliced with Google Ads — no matter how much Google can promise “we won’t let anyone else see the email contents”. The point is, to insert the ads, they have to “read” the emails. Google didn’t get this, somehow they didn’t understand the privacy issues.

Now, with Google Print, they seem somehow not to have understood the copyright issues; at minimum, they failed to anticipate the reasonable concerns people might have regarding Google Print and copyright.

Yes, there are data/math/algorithm geniuses at Google. But does the company have any awareness of the bigger picture? Or, like LTCM, are they content to just laugh off “mistakes”, since they are, after all, “geniuses”?

Tim O'Reilly [08.15.05 07:42 PM]

Kevin -- your comments about Google's "genius" starting to look like LTCM seem fairly unsubstantiated by the facts. As far as I can tell from the number of email messages I get from gmail addresses, the early flap by a few people has died down. The blog entry I wrote back then, The Fuss About Gmail and Privacy, seems to have been right on. I think we see a similar fuss here. In fact, I see Google very much doing the right thing (or at least a very interesting thing, which opens up a lot of future elbow room and opportunity), over people's knee-jerk objections.

Jeroen Wenting [08.15.05 11:07 PM]

Tim, I'd have no problems with an opt IN approach, it's the opt OUT part Google employs that's where they go wrong.

A publisher would basically have to keep a 24/7 watch on Google to see what they're doing with any of their works and write a letter for each and every book they see appearing online that they don't want.
Google is effectively stating that copyright law doesn't apply to them, and the opt out conscheme of theirs is nothing but a veneer to make it seem legitimate.

Maybe today they're publishing only sections, but then why do they digitise the entire publication? Seems to me they're effectively preparing to put the entire book online or maybe they already do but only show a portion at a time.
As already hinted it won't be long before someone finds a way to use the system to pull the entire text of a book out of the system, and when that happens it's bye bye book.
As you failed to enforce copyright on it it will now effectively be in the public domain.

Now I won't use such a service (especially in that way) because I've something of a dead tree fetish (iow, I hate reading from a screen and prefer a real book) but many others will use it like that.
Many will even print the entire thing themselves rather than buy the book from you or one of your retailers even if the cost of printing it themselves is higher (similar to people paying for CDs of illegally distributed freeware, something that's rife on eBay).

You seem to misunderstand the nature of people in general, and maybe so does Google (if indeed their intentions are honourable which I am far from convinced of).

People WILL steal copyrighted material even if it costs them a great deal of effort (either in time or money) to do so, and then distribute that material to others.
Google is only making it a lot easier for them to do so with books. No longer need they buy scanners, pirate OCR software, and spend long hours digitising the public library (or the bookstore, taking the book back for a refund after a few days). They can just go online and download it all from Google, then paste the chunks together.

Tim O'Reilly [08.16.05 07:55 AM]

Jeroen, the reason they (and Amazon, for Search Inside) create a digital copy of the entire book is for search purposes. The whole point is that they are able to search the contents of the book, not just the metadata about it. How else could they deliver search results that show what pages of a book contain the specified content?

As I've noted endlessly in these comments, opt in would make the whole exercise moot, since 90% or more of copyrighted works have no one who is in an effective position (with clear rights, paying attention etc.) to make the opt in decision.

The point is that opt out preserves the original intent of copyright (see creativecommons.org), and makes available to the public works that would otherwise be lost and unavailable. Opt out also preserves the rights of copyright holders, in that, if any holder believes the value of his or her work is being damaged, he or she can easily remove it. If the holder isn't paying attention (especially given the ease with which an author or publisher can find out whether his work is included in the search index), the work can't be very valuable.

I'll lay odds that within a year or two, there will be a lot more fuss about people crying out to google and amazon that their work has been omitted, than that it has been included. Hey, there are still people who complain that their web pages are indexed!

Tim O'Reilly [08.16.05 08:40 AM]

Before anyone else comments on this blog, please read Google's FAQ on the difference between the regular Google Print and the Google Library program:
http://print.google.com/googleprint/about.html

It would make for much more meaningful debate.

Kevin Farnham [08.16.05 07:34 PM]

Tim -- thanks for taking the time to read and respond here! I'm one of the guilty ones who wrote while being under-informed (I thought Google's email search and ad-insertions were proposed for non-Google/Gmail email sites, which apparently is not true).

This forum is very informative and educational for people like me who are interested in the future of computer and data technology, but who have full-time jobs that occupy most of their technology-focused attention. I hope the volume of uninformed responses (again, apologies for mine) doesn't make you guys decide to eliminate or limit open access for posting comments to Radar.Oreilly.com!

Ross Stapleton-Gray [08.16.05 09:06 PM]

> Google needs to offer an online database that
> would allow publishers to maintain a description
> of their books. This isn't a substitute for the
> webpages that I maintain for my books, as someone
> suggested. It's a portal into a database that
> would supply that information as formatted data,
> including updates, to Amazon, BarnesandNoble.com
> etc, so publishers don't have to do separate
> uploads to each in their peculiar format. It'd
> be easily edited by publishers.

Why does Google need to be the authority for this service? I think it could be based on existing infrastructure (see previous comments re the Object Name Service), and leave the decision in the hands of the publishers exactly where the information ought to reside. Certainly, if it were all "stuffed into Google," that puts Google on the hook to do QA, or at least to handshake with you whenever you forget the password you need to go update the data yourself.

Could any publishers here comment on ONIX, which I understand to be a standard for book metadata/structure defined by the publishing industry? Is it used?

Alex Boese [08.17.05 09:12 PM]

It's difficult for me to understand what rational reason publishers have for opposing Google library. The majority of objections voiced here seem to stem from a mistaken belief that Google is proposing making the full text of books accessible, instead of small snippets. Perhaps the publishers believe they might lose a sale if a reader can find a small sample of text from one of their books online. Whereas before they would have lost the sale simply because the reader never knew about the book. Evidently the publishers believe that ignorance is bliss.

bowerbird [08.18.05 02:04 AM]

google has every right, under fair-use,
to make their index. and indeed, they
should be applauded vigorously for
doing society a big favor. nonetheless,
given the flack google has suffered,
i think they should limit it at first to
the publishers who choose to opt in,
and then later charge an arm and a leg
to all the publishers who chose to opt out
initially because they were too plain stupid to
realize this will benefit them enormously.

if the v.c.r. companies would have treated
the movie industry in a similar manner
many years ago, the idiots wouldn't still be
running our content-based corporations...

gary price [09.01.05 04:23 PM]

Tim:
I'm a librarian, author, and news editor of Search Engine Watch. I'll agree with my friend and colleague KGS (comments above)

This post is a question.

You make your books full text searchable and VIEWABLE online via Safari (you own it) (http://www.safaribooksonline.com/) and I think NetLibrary and Books24x7.

Is this service successful? What is the future of these types of services?

What's sad is that many libraries (for example the San Francisco Public Library offer FREE remote access to the full text of Safari) and many people don't know about it.

gary price [09.01.05 04:29 PM]

Tim:
Apologies. I seen you've already commented about Safari. However, if you could share more, I would be appreciate it.

Tim O'Reilly [09.01.05 05:48 PM]

Gary --

Safari is very successful. It is now our fourth largest reseller, after Barnes & Noble, Borders, and Amazon. (We don't make books available through Books24x7, because their economic model provides only pennies on the dollar to publishers, and I believe we're no longer working with NetLibrary either.)

We offer two business models in Safari. For B2C customers, it's sold on a "bookshelf" model, a bit like NetFlix. That is, the customer has a bookshelf with a number of monthly slots, and can swap books in or out each month. I believe that a 15 slot bookshelf is most common. For B2B customers, we typically do a single price for some number of seats, with access to all the content.

In either model, royalties are paid to authors based on access. In the B2C model, putting a book on your shelf (using one of your slots) generates revenue for that book. In the B2B model, it's a percentage of total page access.

We're working on a study now that will compare online usage to print book sales. But on a preliminary basis, we've found that "long tail" books that no longer sell well in retail represent a disproportionate part of the usage. While the numbers on individual books are still small, I believe the revenues (and hence the royalties to authors) from Safari are greater than the royalties from print books on the many older books. In short, we believe that online access will create incremental revenue for titles that are no longer generating revenue in print.

I'll be reporting more on what we've learned from Safari and Google Print in future postings.

Chase Venters [09.29.05 10:29 AM]

My opinion is probably one of the more extremist in this field. The idealist in me sees the Internet as an enabler. What it enables is the sharing of human knowledge and worldwide collaboration. I am deeply troubled that traditionalist industries (proprietary software and content providers) are achieving a high rate of success in 'holding back' the technical innovation of the Internet.

They want to own it all, so they can make money off of it. At the end of the day, it's very clear to me that the real problem here is that due to the Internet, their existing business models are now totally irrelevant. I believe the onus should be on them to correct their business models - the onus certainly shouldn't be on new technology to adhere to their old fashioned ways.

Lessig makes a good point when he refers in one of his books to the problem brought about by the airplane. The old concept of property provided that you owned your plot of land, plus all the sky above it. Some farmers sued a pilot on the grounds that flying over their land without their permission was trespassing. Just think what would have happened if the supreme court wouldn't have been wise enough to strike down the old ways - transcontinential flight would have been basically impossible.

I have no sympathy at all for the likes of Microsoft, the RIAA, the MPAA, or other traditional content outlets. If they can't figure out how to adapt, then they are being phased out by the natural evolution of technology (an extent of the evolution of nature itself - indeed, the Internet and all of our great technology can be traced back to the environmental conditions that sprung us forth on this planet in the first place). They, in this case, are either unlucky, or totally incompetent (generally the latter, since they have the option of choosing to adapt).

heh [02.08.06 08:08 AM]

I wonder if there is a moderator here...
If there is one, i urge him to delete my post and the one above.

Nice read - this thread. I must say i really enjoyed it all the way up to the last post :)

Tim O'Reilly [02.08.06 03:50 PM]

heh --

We do tend this garden -- every comment gets emailed or RSS-fed to the original poster -- and we prune most of the spam doesn't get caught by the filters. But every once in a while something slips by. I hadn't revisited this page in a long time, so your comment was a great way to make sure I saw (and deleted) that spam. I left your comment in for any other readers to let them know that we are indeed reading comments, and that it's good to tell us about any issues you see.

Sat

Google Library vs. Publishers

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/4238

Comments: 34

Post A Comment:

RECENT COMMENTS

MOST ACTIVE | MOST RECENT

RADAR TEAM

RADAR TOPICS