Previous  |  Next



Tim O'Reilly

Tim O'Reilly

Negative Reactions to Microsoft attack on Google at the AAP

I've been seeing lots of reactions to Microsoft's attack on Google at the Association of American Publishers, and it isn't pretty. (Here's a link to the full text of Microsoft Assistant General Counsel Tom Rubin's speech.)

For those of you who have been hiding under a rock, and don't already know this, the AAP is suing Google for scanning books from libraries. Publishers insist on an opt-in approach, while Google insists that scanning and indexing is fair use, and that publishers can opt out, just like web sites do with robots.txt.

Rubin writes:

In my view, Google has chosen the wrong path for the longer term, because it systematically violates copyright and deprives authors and publishers of an important avenue for monetizing their works. In doing so, it undermines critical incentives to create.

Anyone who's read what I've previously written on this subject knows that I believe that this is 180 degrees opposite from the truth. The search engine economy has led to a great outpouring of creativity, and new incentives undreamed of by ink-on-paper publishers. Google is offering publishers and authors a jump start in joining that new economy. What's more, Google is using the very same copyright fair use exemption that they use to create their web search engine (and that all other web search engines rely on as well.)

Meanwhile, the truth is that publishers don't actually own the rights to most of the works that they are supposedly protecting, or at least no longer know who owns them, and don't have copies of the books to scan even if they were clear about the rights. Google's creative approach solves a very hard problem for publishers and will create enormous new opportunities for authors.

Put another way, Google is offering a $200 million handout to publishers and authors, not stealing from them! Once all books are searchable, we will discover which of the 32 million out of print books now only in libraries are valuable, and publishers will be incentivized to clear the rights to those books and bring them back to life. If the AAP prevails in their lawsuit, only a small fraction of the available works will ever make it online.

What's more, if Google is wrong that making a copy in order to create a search index is fair use, then the whole search engine economy comes tumbling down, since web search itself depends on the same fair use exemption.

As a publisher, I'm nervous and excited about the disruptive changes that Google Book Search could bring to the publishing ecosystem. It could be very good for my business, but as with the web itself, it will also bring new competition. And I'm sure that some publishers will fail to rise to the challenge, and will be outperformed by other publishers who are quicker to embrace the new book search economy.

But as someone who cares about the future of books, the future of knowledge dissemination, and the future of publishing, I am 100% clear that the opposition to Google Book Search by the AAP is nothing more than posturing by entrenched businesses afraid of disruption. Get a life, guys! If books don't become part of the online search economy, they are doomed to eventual irrelevance. Publishers must reinvent themselves, embrace the future. And Microsoft loses a huge amount of credibility by pandering to the publisher position.

But OK, let's assume that legal posturing is an acceptable part of business negotiation, that the publishers are merely doing good business to sue Google and see if they can get a favorable settlement, and that Microsoft legitimately sides with the publishers. But Rubin goes way beyond just siding with the publisher position, going into a full-on smear of YouTube, and Google by association:

Companies that create no content of their own, and make money solely on the backs of other people's content, are raking in billions through advertising revenue...

Google's track record of protecting copyrights in other parts of its business is weak at best. Anyone who visits YouTube, which Google purchased last year, will immediately recognize that it follows a similar cavalier approach to copyright.

Google also encouraged the use of keywords and advertising text referring to illegal copies of music and movies....These are not the actions of a company that has the interests of copyright owners as one of its priorities.

This is the kind of mudslinging that has turned Americans off politics, and it's beneath Microsoft to stoop to it. Anyone at all familiar with the issues knows how complex they are, and what a cheap shot it is to frame them this way. Microsoft is a great company facing great challenges. It should bring out the best in them, not the worst.

Danny Sullivan does a good job of analyzing each of the Microsoft arguments, including pointing to an amusing example of Microsoft running ads on google searches of the very sites that Rubin calls out as pirates. Danny is persuaded by a few of Microsoft's points, but even he concludes:

Overall, I have to say it's disappointing seeing Microsoft come out on an attack stance rather than be positive about what it is doing. Google deserves slams, and I wish they'd change to an opt-in policy for copyrighted books. But for me, with perspective, Microsoft comes across as someone trying to play catch-up and willing to be negative to do it. I don't like that in political campaigns, and I guess I don't like it any more in the search wars. But most important, it's a dangerous game to play. The more Microsoft paints itself as some type of pure protector of copyright, the harder it will fall as people find examples where it fails to meet expectations.

Over on, Cynthia Brumfield is much harsher:

"Although Microsoft's attempt to exploit Google's YouTube problems is understandable, it's also slightly repulsive and reeks of desperation. The software titan is hoping to build itself up by tearing Google down, never a good long-term strategy for success."

Don Dodge gets the gutsy blogger award, though, since he works at Microsoft but still wasn't afraid to call a spade a spade! He wrote:

Oh boy, here we go. Microsoft attacks Google on copyright regarding their book scanning project, and then takes a swipe at YouTube as well. Really dumb move! What are these Microsoft lawyers thinking? Even if they are right, which is debatable, what reaction do they expect from the public at large? This strikes me as pandering to the Association of American Publishers where the Microsoft lawyer is speaking today....

The AAP filed suit against Google for copyright infringement 16 months ago, and it is still in the courts. What is to be gained by making these inflammatory comments? Be quiet and let the courts sort this out.

Public Relations and perceptions are affected by everything Microsoft says or does. It comes with the territory. Making inflammatory comments about a competitor is never a good idea. Right or wrong on the facts...the statements are bound to have unintended consequences.

There are always at least two sides to every legal argument. There are lots of scenarios where the law is not clear and that is why these things are argued in court. Case law clarifies the details and codifies the rules. Great, let the courts involved sort that out. Microsoft should stay focused on business and satisfying customers.

Amen. I hope Microsoft listens to Don Dodge, and that he doesn't get in trouble over his thoughtful and balanced comments.

tags:   | comments: 18   | Sphere It


1 TrackBacks

TrackBack URL for this entry:

» Mr. Softie Blinked from gWHIZ

Ooooh, I saw this in Mr. Scoble’s link blog earlier today. Then at Mr. Battelle’s. Saw it again over at Don Dodge’s and now at Tim O’Reilly’s. Funny thing is I was talking with Christian DiCarlo at Google Scholar just toda... Read More

Comments: 18

Adam Hodgkin   [03.06.07 09:48 PM]

It is obvious that Microsoft are taking advantage of apparent Google vulnerabilities. But there are vulnerabilities. Google does appear to be more accommodating in the way it is dealing with TV companies than with the book publishers -- the TV companies are not being told its 'opt out' only. Google is offering to police YouTube submissions and develop systems for filtering out copyright material for which permission has not been granted. Why should book publishers be treated with a different rubric?

In the end Google is going to have to win over publishers to its method and it is only going to succeed in this if it puts the emphasis on securing their agreement. Even if Google is allowed to use 'fair use' to scan and database books without asking anyone except a librarian for permission, they will need Publisher agreement and rights holder agreement on what they are allowed to snippet (illustrations, poetry, music, formulae?). There is plenty of minutiae in the interstices of copyright documents where Google will not be allowed, should not even try, to make up its own implementation and distribution rules.

As you point out the publishers and Google should really be on the same side in this. Really encouraging to see that Springer and Google have just now announced that 29,000 Springer books (some of them out of copyright) are now already in GBS. Quite an achievement. GBS is working! See my comments on this at:

nicolas   [03.07.07 02:42 AM]

I like the following criticism :
"Companies that create no content of their own, and make money solely on the backs of other people's content, are raking in billions through advertising revenue.."

It's typical of people just not getting it.
I will use an almost perfect parallel to explain how wrong this reasoning is.

A while ago france was mostly agricultural.
Every job dealing with money was seen as dirty and immoral.
The basic reasoning was that finance people did not produce any goods and they were making money, hence they should be stealing and impoverishing the good french peasants.

I let you imagine the amount of missed opportunities that an illiquid financing environment can have on companies, and the drag on innovation this creates.

Not being able to pledge one's asset at good condition, to finance new developments, not being able to redirect the short term deposit into long term investments.. all this was simply absent in the critics's mind.

airdrummer   [03.07.07 10:25 AM]

> it's beneath Microsoft to stoop to it.

no it isn't;-)

Anonymous   [03.07.07 10:39 AM]

From Title 17 of the federal code, emphasis added:

� 107. Limitations on exclusive rights: Fair use38

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include �

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

� 108. Limitations on exclusive rights: Reproduction by libraries and archives39

(a) Except as otherwise provided in this title and notwithstanding the provisions of section 106, it is not an infringement of copyright for a library or archives, or any of its employees acting within the scope of their employment, to reproduce no more than one copy or phonorecord of a work, except as provided in subsections (b) and (c), or to distribute such copy or phonorecord, under the conditions specified by this section, if —

(1) the reproduction or distribution is made without any purpose of direct or indirect commercial advantage;


(2) No reproduction, distribution, display, or performance is authorized under this subsection if —

(A) the work is subject to normal commercial exploitation;

Andrew Grabois   [03.07.07 11:09 AM]

What's interesting here is not Microsoft vs. Google, but why you think Rubin's speech is "180 degrees opposite from the truth".

Your contention that "publishers don't actually own the rights to most of the works that they are supposedly protecting, or at least no longer know who owns them", is based on a misinterpretation of an OCLC paper ("Anatomy of Aggregate Collections") that attempts to quantify and characterize the holdings of U.S. libraries, including the original five research libraries that agreed to let Google scan all or parts of their collections. A close reading of the OCLC analysis shows that the majority of most print collections held by research libraries were:

(1) not published in the U.S.
(2) created/published after 1974
(3) not commercially published books.

What's more, if you look at how many new books have been published in the U.S. over the last 100 hundred or so years, it also becomes clear that a universe of 32 million books is wildly inflated. This is important because if you use a lower baseline for commercially published books and compare that with the 3.5 million still in print (just in the U.S.), you find that at least a third, if not more, could still be protected by copyright.

Coincidentally, I fleshed this out in a post today on Michael Cairn's blog, PersonaNonData at:

Mike Perry   [03.07.07 12:07 PM]

QUOTE: "Put another way, Google is offering a $200 million handout to publishers and authors, not stealing from them! Once all books are searchable, we will discover which of the 32 million out of print books now only in libraries are valuable, and publishers will be incentivized to clear the rights to those books and bring them back to life."

RESPONSE: If clearing those rights is so easy and inexpensive that any publisher, however small, can do it, then why is Google so hostile to doing it themselves before adding each copyrighted book to their index? Having located the copyright holder, they could place that contact information online and actually further their claimed goal of bringing a book back into print. After all, they have infinitely more money to burn than any publisher on the planet, as well as a vastly lower cost of distribution, and, if they sought permissions, an enormous economy of scale.

What chance do authors have of seeing their book back in print if 20% of it is instantly available online (per month) for free? Google makes the minimum a publisher can display an over-large 20% because that maximizes their revenue. A figure more like 5% would give readers a more than adequate look at a book's content and maximize the author's income. It's not hard to see whose interest Google is serving.

QUOTE: "What's more, if Google is wrong that making a copy in order to create a search index is fair use, then the whole search engine economy comes tumbling down, since web search itself depends on the same fair use exemption."

RESPONSE: Not true. A search engine link resembles a short, fair-use quote in another text joined to a footnote pointing to the full text, available only in the original book. What Google is offering is like including two chapters of a ten-chapter book in their own book without paying a penny to the author. That's an anthology or collection, and for that the author's permission is always required by law. What Google is actually doing is far worse than that. Google's anthology (and in legal terms their source of profit) is all of every book they scan as an income source for themselves, with online users merely limited to a still very illegal two-chapters out of ten. And for this Google superanthology the authors get not a penny. Google can't even be bothered to ask their permission.

However, it is true that when someone pushes copyright law too far, the courts have a tendency to come down too hard, in part because of the 'winner take all' way copyright laws are applied. Publishers, eager to avoid costly litigation, then shy away from what was (and still is) legitimate. By pushing the law too far, Google is serving no one. I know, I was caught in a copyright dispute born out of just that sort of blunder. I won, but only after a long and brutal fight. And I won because the other side bailed out at summary judgment, so the original 1998 Second Circuit blunder (Castle Rock) still hangs over copyright law, leading many publishers to avoid books about popular, contemporary fiction like my Untangling Tolkien.

QUOTE: "But as someone who cares about the future of books, the future of knowledge dissemination, and the future of publishing, I am 100% clear that the opposition to Google Book Search by the AAP is nothing more than posturing by entrenched businesses afraid of disruption."

COMMENT: No, no, no. Get off this nerdy, "entrenched business" nonsense. My little one-Mac publishing, Inkling Books, is only six years old, and I use every advantage that new technology permits. Just two weeks ago I blasted those at Amazon's ebook division for reducing the ebook formats they distribute. And at the same time I added several of our ebooks to Google's scheme to insure that they remain available. I use technology anywhere it makes sense for me as an author and editor. But I'm not impressed with Google's use of technology and the works of others primarily to increase their own profitability. What's good for Google isn't necessarily good for the rest of us.

Finally, my chief gripe with the high-tech world is that it's moving too slowly and is too out of touch with the realities of book creation and use--too nerdy in fact. As a research/consumer of books as well as an editor and publisher, Google's scheme has "game-playing nerds in their mother's basement" written all over it. There's a thousand things wrong with what they're trying to do. I've only touched on a few of them here. I'd be quite happy to point them out if Google is willing to listen.

--Mike Perry, Inkling Books, Seattle

Bruce Albrecht   [03.07.07 04:36 PM]

Mike Perry is clearly mistaking the Google Books Publisher program for the Google Books Library program. While the Google Books search returns books from either the Publisher program or the Library program, books from the Library program only display 3 line snippets if the book is not in the public domain. It is the scanning of the entire book and displaying the 3 line snippets to which the AAP objects. They do not object to the Publisher program as that is an opt-in program, even if Google forces a minumum of 20% of the book to be viewable.

Since Mike appears to not understand that the topic of conversation here and at the AAP is the Google Book Library program and not the Google Book Publisher program, I doubt that he will get far in getting Google to listen to him.

I'm sure Tim O'Reilly's experiences as a book author, editor and publisher (both print and electronic) far exceed that of the proprietor of admitted one-Mac six-year-old publishing house. If he's out of touch with the realities of book creation and use, I would be quite astonished

Tim O'Reilly   [03.07.07 04:57 PM]

Andrew, I read your posting with interest, and wanted to contact you about it. I thought your numbers made some sense. You're right that the 32 million number includes foreign language books, and probably does include ephemera as well as proper "books." I'm not sure that takes the number all the way down to 7 million as you claim, but let's assume it does.

You then contrast this with "3.5 million books still in print." You don't apply the same rigor to this number as you apply to the OCLC number. Bowker itself claims 2.5 million on their web site, and they don't say how many of those are foreign language books, audiobooks, etc.

You also conflate "in copyright" and "in print" in many of your assertions. In my experience buying rights, there are many books that are in copyright that are still not in print, or have any publisher paying attention.

But even if the right number is 3.5 million, your figures still leave a gap of 3.5 million books that are either public domain or in the twilight zone.

Publishers have no plan for ever digitizing these books. That's a big problem, in my opinion.

Tim O'Reilly   [03.07.07 05:05 PM]

Mike -- as Bruce points out, you are confusing Google's library program with their publisher program. Google shows ONLY 3-line snippets of books scanned by libraries, unless they are in the public domain or in the publisher program. You are talking about a search index page, just like your page of google results -- not click through to any percentage of the text.

So many of your comments are based on this misunderstanding -- which we've corresponded about before -- that I just don't understand your agenda. Have you ever actually tried to use Google book search, and read the FAQ about how it works?

As to your second question, if this is so easy, why can't Google ask up front? The answer is obvious, if you think about what I said. It's easy only AFTER google has built the index. Then, you'll be able to google for the information! Right now, you have to go find a copy of the book, or contact a publisher (who may no longer even exist), see if their rights person has a copy of the contract (which may no longer even exist), and see if they are willing to talk to you. It's near impossible.

The point is that once the index is created, most of that stuff will still be ignored (just as the publisher of my first book, Frank Herbert, ignored both my requests for a conversation about putting it online, or reversion of rights, and subsequently ignored the fact that I put it up online anyway -- they just don't care.) The point is that IF it becomes widely used and of perceived value, it then becomes both much easier, and much more worthwhile, to establish the rights.

Something you need to understand is that even ten years ago, few publishers even bought online rights. They don't own them in the first place, so they can't grant them.

Google solves this very sticky wicket for publishers, getting the books online. Once they are up, it's a lot easier to allocate the value from those that are valuable, and to continue to ignore the ones that aren't. It's a great mechanism for winnowing the wheat from the chaff.

Andrew Grabois   [03.07.07 07:53 PM]


The in-print figure from the Bowker site is marketing copy that was sadly out of date when it was put up there in 2003. If you search, or better yet, ask someone over there to query the offline database, you will find that their in-print number is over 3.5 million.

Yes, I agree that even a few million out of print and possibly out of copyright books is an opportunity (I'm not convinced it's a problem), but it is a far cry from the heart-tugging notion that there are tens of millions of orphaned books buried alive and beseeching us to liberate them from copyright.


Tim O'Reilly   [03.08.07 06:58 AM]

Andrew, I will indeed ask Bowker. But it would seem to me that if you're going to be spreading the idea that the numbers in the OCLC report are overstated, you should apply the same trim factors to the Bowker figures.

But even then, you conclude that only "a third, if not more, could still be protected by copyright." That leaves 2/3 in limbo -- still a large number.

(Although I have to say that you're skating over the difference between "in copyright" -- a number that even Google might think is low -- and "commercially available." The twilight zone of orphaned works isn't the same as works that are out of copyright. It includes books whose copyright status is unknown, books whose copyright owner is known in theory but not in fact (who bought publisher w back in 1945? who owns any remaining assets of publisher x who went out of business in 1963? does publisher y have a copy of the contract for book A that they acquired along with the assets of publisher x in 1963? does publisher z even care to look in their files for a book that hasn't sold any copies since 1981? Do they have a copy that they could scan?)

One big problem with opt-in is that only 1.1 million books had any sales last year, according to Nielsen Bookscan. Do you really think that publishers will go to the expense of scanning (or even opting in for Google to scan) books that they haven't paid attention to for decades?

(My Frank Herbert book is a good case in point.)

Andrew Grabois   [03.08.07 08:49 AM]


My intention is not to argue that the OCLC numbers are overstated. Simply stated, their study shows what U.S. libraries are holding in their print collections. What I'm saying is that the OCLC findings have been misunderstood and mistakenly applied to the debate over the Google library project. Print collections held by research libraries and commercially published books are two very different things. In the same way, one cannot compare the universe of print holdings published globally with in-print figures for books published in the U.S. market.


Tim O'Reilly   [03.08.07 09:05 AM]

Andrew -- You're still not answering my question. Are you applying the same criteria to the Bowker numbers that you are applying to the OCLC numbers?

Andrew Grabois   [03.08.07 10:18 AM]

Tim: I'm not sure what you're looking for here. Do you want me to say that Bowker's in-print figure could be high? Well, ok, it could be. But so what. That's not the issue here. Even if their in-print total is off by a million, it still won't bring us any closer to a universe of U.S.-published books that are mostly out of print and in copyright limbo.

Tim O'Reilly   [03.08.07 10:34 AM]

Yes, that's what I'm looking for? If you want to compare apples to apples, you have to apply the same rules on both sides.

And I disagree with your conclusion. If by the math in your blog post, OCLC shows perhaps 7 million books, and Bowker is 3.5 million in print, that means half are either public domain or in limbo. If the Bowker number is 2.5 million, the number is closer to 2/3.

Neither number supports your conclusion that this isn't a problem. It may not be the 75% that I concluded based on my original reading of the OCLC numbers, but it's still pretty darn significant.

bowerbird   [03.08.07 12:08 PM]

google is doing publishers a big favor.

i continue to be astonished that those
publishers aren't much more appreciative.

if google _refused_ to scan their books
-- maybe until a sizeable fee was paid --
perhaps those publishers would wake up...


deej   [03.12.07 06:06 PM]

Voluntary Google scans? Is there any mechanism to request Google to get something scanned that you know is in a library but isn't available electronically? In my case it is music scores by my father, a classical and early Moog synthesizer/electronic music composer. I haven't heard that Google was scanning music scores as part of its efforts, but I would love to get these archived digitally (I'm not so concerned about the recordings of performances). And from what I know of the many composers that he worked with and corresponded with over the years, there is a lot of good music in university libraries that will never be known unless it is made accessible in digital formats. Our family retains the copyrights to most of his works.

Andy Wong   [03.16.07 06:24 PM]

Hi Guys. More than 75% of the books that will be available in Google has no copyright. Event a publisher prints Copyright at the back of the book, does not mean it owns the copyright of the book. Can you image anyone in the world owns the copyright of William Shakespeare's works? 99 years after first publish, or 50 after the author's death in average, remember?

Google have every rights to make money from these public IP, so it can, while other can't. The those books which really have copyright associated, there should be some practical ways to clarify the copyright.

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

Subscribe to this Site

Radar RSS feed