Oct 26

Tim O'Reilly

Tim O'Reilly

OCA vs. Google Print Library Project?

I was struck in the recent news about Microsoft joining the Open Content Alliance by the curious framing of this announcement:

The move comes as Google faces growing legal pressure from publishers over its own global digital library plans.

Microsoft said it would initially focus on works already in the public domain.

This way, it hopes to avoid similar legal issues over copyright.

This PR positioning makes me think that the OCA, a worthwhile effort (to which O'Reilly has contributed content), is being hijacked by Microsoft as a way of undermining Google. In fact, the OCA addresses only a subset of the "lost content" problem in print book publishing that is addressed by Google Print for Libraries.

According to a recent study by the Online Computer Library Center, which analyzed the books in the collections of the five libraries participating in the Google Print for Libraries project, only about 20% of the 10.5 million unique titles in the collections of the five libraries are out of copyright, using the 1923 change in the copyright law as a dividing line before which you can assume books are out of copyright. This 20% of books out of copyright is the realm of efforts like OCA. Meanwhile, another 10-20% are under copyright, in print, and being commercially exploited. This is the realm of titles opted in by publishers to programs like Google Print or Amazon Search Inside the Book. That leaves 60-70% of all titles ever published in the twilight zone, out of print, but still under copyright. For many of these books, no one even knows any longer who owns the rights, and there is no commercial incentive to figure it out, making the publishers' request for "opt in" a fig leaf that will ultimately lead only to continued neglect.


As I've written previously, Google Print is the only effort that attempts to cut the Gordian knot that entangles titles that are under copyright but no longer being commercially exploited. Working with libraries to build a searchable index of their collections is a brilliant application of the principles of copyright fair use that will unlock the vast number of books in this middle category. What's so beautiful about this approach is that as search helps users to rediscover value in these "lost works", publishers and authors will have an economic incentive that is missing under the current situation to discover and assert their ownership. OCA is a complementary effort, but it does not at all address the same problem.

tags:   | comments: 21   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 21

  choi li akiro singh santos [10.26.05 08:03 PM]

It seems that there is a simple solution: Google should pledge to turn over profits to a non-profit entity (like the Internet Archive), hopefully benefiting writers or writers' guilds.

Google is now a multi-billion private corporation that can afford free meals for ALL its employees. Why should it profit from aggregating content of other companies (i.e. large publishers) without those same companies wanting a share of those same profits? It seems that Google should SPLIT advertising and referral profits with the "content providers". Google print would not exist without the publishers' content, the profits wouldn't exist without Google print or a similar tool.
So let's COMPROMISE guys, let's fork over ALL the profits to a worthy cause instead.

How many more billions of dollars does Google need? Enough is enough. We don't need another dominant company. Now they are threatening to go after Craigslist???? The one site that runs NO ads and plows MOST of it's profits to a foundation. "Doing no evil" includes letting smaller companies thrive and survive.

Micosoft should undermine Google, if Google insists on undermining smaller companies. Both companies want to OWN as much as possible. Both companies are secretive. It's time to use:

The less market share Google has, the more restrained it will be.

  Liz Lawley [10.26.05 08:14 PM]

Geez. Microsoft tries to do the right thing here by joining the good guys, and the immediate response is that they're "hijacking" it? :D

I was at a panel on the "googlebrary" last night when the OCA announcement broke, and blogged 90% of the evening's well as the follow-up debate at a panel this morning, entitled "Google: Catalyst for Digitization or Library Destruction?" in which Roy Tennant (a librarian involved with OCA) makes some great points about Google's initiative.

Oh, and Choi, Yahoo's got a *much* larger market share of search than Microsoft...

  Machine of Time [10.26.05 09:20 PM]

This PR positioning makes me think that the OCA, a worthwhile effort (to which O'Reilly has contributed content), is being hijacked by Microsoft as a way of undermining Google.

It's too bad that we can't have a real debate about ideas, rather than cynical rhetoric that creates heat without shedding much light."

  Anonymous Coward [10.27.05 07:44 AM]

I don't get all this talk about Google providing a book search where they make no money, as the first comment suggests. It seems that most people perceive scanning all the books in some huge libraries and then offering anyone the ability to search the results as a trivial task. This is a monumental task and it is going to be a big investment for Google. What everyone seems to be saying is: "Want to make a big investment and solve an existing problem? Go for it! Oh, you want to see the returns from the investment as well? I don't think so."

Guess what guys Google pays people to work on this stuff and provides the infrastructure to make it work. They also expect to see some returns. That's how capitalism works.

  Setag Llib [10.27.05 08:24 AM]

So when Google does it it's a big investment that deserves a payoff, and when Microsoft does it, they're hijacking Tim's Good Work? I'm so glad we're having a real debate over ideas and not indulging in cynical rhetoric that creates heat without shedding much light.

  Sid Steward [10.27.05 08:50 AM]

Addressing Tim's point of the 60-70% of library books falling into the twilight zone, let's assume Google does index 100% of these library books. I wonder what the user will do when he actually gets search hits into this twilight zone. It would be like searching the internet and having the first page serve six broken links. You can see the excerpt, but no page. No book to buy, no recourse to anybody since it's unclear who owns it.

As a user, I would want to see Google's scan of that page. I might even be willing to pay for this page. I would probably end up buying five or ten pages.

To accomplish this, let's have third parties maintain holding accounts for these books in the twilight zone. Google would split my money with such 'absentee' publishers/authors, depositing their share in the holding account. The third party would be responsible for representing the absentee owners' interests and would be entitled to compensation. If somebody comes and successfully asserts ownership, then the account reverts to them.

Holding accounts would encourage absentee owners to step up, since there is now money at stake.

Holding accounts would also be useful if a publisher comes along who wants to re-release an out of print work currently in the twilight zone. The third party would negotiate terms, and the profits would collect in the holding account.

"Unclaimed property" seems to be a longstanding problem, so there must be precedents. According to the National Association of Unclaimed Property Administrators:

"The origin of unclaimed property law dates back to British common law. Abandoned land was returned to the king along with the transfer of the property rights. Today, this concept has been adopted by the states and applied to intangible property as well as tangible property, excluding real estate. The states do not take permanent title to the property but act as custodians to safeguard it for the rightful owner or their heir until claimed. In nearly every state, there is no time limitation for filing a claim."

  Sid Steward [10.27.05 09:10 AM]

"That leaves 60-70% of all titles ever published in the twilight zone, out of print, but still under copyright. For many of these books, no one even knows any longer who owns the rights...."

So, many of these Twilight Zone books do have known owners. It would be helpful in this debate to know how many have unknown owners. It could be 5%, it could be 50%.

When I visit my library, I don't feel like I'm in the Twilight Zone. (-:

  Tim O'Reilly [10.27.05 10:14 AM]

Interesting set of comments. Let me start by responding to "Machine of Time" and Setag Llib. I don't think that this should be a debate about Microsoft vs. Google. I'm not the one whose PR set up that opposition. I'm just pointing it out, precisely to suggest that it's misleading, and obscures the real issue.

So no, I don't think that opening my piece with a description of this PR positioning is the kind of "cynical rhetoric" that I was referrring to the other day in critiquing Nick Carr's piece, and that you threw back in my face by quoting that piece. Nor did I use that odd juxtaposition in Microsoft's announcement to engage in a rant about Microsoft. I simply pointed out that positioning OCA as a "copyright friendly" alternative to Google Print for Libraries misses the point, because OCA doesn't address the hard (but incredibly important) problem that GP4L attacks. I moved quickly to the real issue.

Nor do I think it's OK for Google to profit, and not Microsoft. Both are great companies, who've created huge value for their shareholders by creating huge value for their customers. They deserve to profit for what they've accomplished. While I've been a critic of Microsoft for abusing their monopoly position (for example, in their 1996 attempt to hijack TCP/IP), but I've also come to their defense, as when free software zealots were misinterpreting Jim Allchin's comments about open source. Similarly, when I've had concerns about Google, I've expressed them, as when I cited the ambiguity of Google deprecating sites who sell their page rank to search engine spammers, while themselves accepting those spammers as Adsense customers. In short, I think I am interested in having a real debate about ideas, and am not picking favorites among companies. (Heck, consider when I took on Amazon, one of my largest customers, over their 1-click patent.)

And if Google begins to abuse their market power, I'll be quick to criticise. But I'm very uncomfortable about criticizing a company just because they are "too profitable" or "too powerful." It's only an issue if they abuse that power.

In this particular case, I don't believe there's any abuse of power. Google is doing something bold, which will have powerful and valuable outcomes. And yes, I'm worried that they might one day become "too powerful", but I'm willing to take that risk because of the benefits to the public, to authors, and to publishers that I believe that their project will bring.

choi li akiro singh santos - I find your idea that somehow Google shouldn't be allowed to make any more money than they already have at odds with my experience of how the profit motive can be used to solve some hard problems, by spurring investment years ahead of any payoff.

Let's look briefly at the economics. The Microsoft announcement included information that they were donating $5 million to the OCA, which they said should be enough to scan 150,000 books. Well, the libraries that Google is working with contain approximately 10 million volumes. Do the math. This is a very expensive proposition that won't pay off for years.

And even then, look at Google's typical economic splits with their advertising partners: 80:20 (with Google taking the smaller share). So as Google Print takes hold, any advertising revenue will accrue predominantly to the copyright holders, not to Google.

And yes, in what I've referred to as "the twilight zone," it isn't clear who those content owners are. But believe me, if this does in fact turn out to be a profitable area for search, copyright holders will be coming out of the woodwork in order to claim their share. That's the beauty of the system: it creates an economic incentive to solve a problem that otherwise is going to remain insoluble.

Meanwhile, I think that Google's stock market performance makes everyone think they are more powerful than they are. Compare Google's profits for the past year - $2.24 Billion -- with those of Microsoft ($17.51 Billion), IBM ($16.27 Billion), or Oracle ($4.96 Billion). Google's much more in the range of Yahoo! ($1.55 Billion), Barry Diller's Interactive Corp ($1.50 Billion), or Apple ($1.82 Billion). Google's doing pretty well for themselves, and growing fast, but they are a long way from the top tier in revenue or profits. Or how about non-tech companies like Exxon ($60.19 Billion in profits!) Nor are the publishers suing Google small underdogs. These are multibillion dollar conglomerates, many of them significant multiples of Google's size.

Sid -- as to your suggestion for treating this as "unclaimed property," I don't think that's necessary. That might be appropriate if Google were clicking through to the content, and showing it to users. I believe that Google's search snippets are indeed fair use, and no more deserve compensation to the author or publisher than do advertisements in the NY Review of Books or a magazine like the New Yorker, when they review and quote from published works, surrounding those reviews with paid advertising.

But what we'll be able to tell is what works are of interest, and when the copyright holders come forward, they can opt them in to the normal Google Print service, where people can access them.

  Liz Lawley [10.27.05 11:27 AM]

I posted a comment on this post this last night, but it still hasn't appeared. Are Microsoft-affiliated commenters not allowed?

  Sid Steward [10.27.05 11:42 AM]

Tim -- Thanks for the follow-up. In my first comment I tried to advance two ideas:

First, that Google should offer book content online, not just excerpts. Since they are scanning pages, they could provide me the scan. I would be willing to pay for it. It would be just like using the library photocopier. My money would be split between Google and the publisher.

Second, that income from this sale of content could be managed as unclaimed property if the content's owner can't be determined. As you note, this kind of economic incentive will drive owners out of the woodwork.

My goal with these ideas is to look past fair use and instead align publishers' interests with Google's interests. Regardless of who is right about fair use, it is unnatural for such complementary industries to come to such blows over integration.

I wonder how many members of the publishing industry are multi-billion dollar conglomerates. My impression of print publishing is that it is hard to make a buck, and it is commonly a labor of love (much like writing).

  Tim O'Reilly [10.27.05 12:18 PM]

Liz - no bias against Microsoft-affiliated commenters! Just a bias on the part of Movable Type of holding comments containing links for approval before posting, as a way of limiting comment spam. Sorry for not noticing your comment sooner, especially since those are very interesting links.

However, I'll remind you that the term "hijacking" referred not to the goals of OCA, but to the PR positioning. There was no need to position joining OCA as if it were somehow in opposition to Google. Why are this being spun as "the good guys," as if Google is "the bad guys"? I see both these efforts as "the good guys." I'm not questioning the value of Microsoft's involvement -- I love to see additional support for OCA -- but rather the PR spin that was put on it.

Sid -- re multibillion dollar conglomerates -- five companies do in fact control about 50% of all book publishing revenue, and 80% in some markets (like textbooks). And my sense is that it's these big companies that are in fact most opposed to GP4L. Small publishers are far more likely to see disproportionate benefits from effective book search, which would help to level the playing field.

Getting very specific, looking at computer books, four companies -- Pearson, Wiley, O'Reilly, and Microsoft, in that order -- control more than 80% of the computer book market. Of those four, only O'Reilly is not a billion dollar company, alas.

  Sid Steward [10.27.05 01:07 PM]

For the benefit of the 'who has the power' graph, above, here is the operating profit reported by Pearson and Wiley in 2004. I'm not an accountant, but I think these are the appropriate numbers.

  • Pearson: $0.444 Billion profit on $7.1 Billion in sales.
  • Wiley: $0.125 Billion on $950 Million in sales.

I think the difference is that they must spend more per dollar of income than those in the technology industry.

  Liz Lawley [10.27.05 01:11 PM]

The reason I position OCA as "the good guys" in this context has a lot to do with the points that Roy Tennant so eloquently addressed in the debate yesterday. Biggest on the list for me is openness/transparency. The copyright piece is a bit of a straw man argument from my point of view, which allows people to conveniently ignore the difference between open and closed approaches to digitization. Google's commitment to "providing access to the world's information" stops squarely at its front door, and that troubles me. For all its faults (and I agree there are plenty), Microsoft has done a much better job of putting information into the public domain through its research arm. Kevin Schofield had a good post about this.

From a PR standpoint, it seems not unreasonable for Microsoft to hope for some positive press as a result of a $5 million donation. I'm just somewhat uncomfortable with your use rhetoric that's intrinsically inflammatory.

  Tim O'Reilly [10.27.05 01:16 PM]

Liz -- I think Microsoft should get lots of positive press for a $5 million donation to a worthy project. But why should they frame that donation with a sideways swipe at Google? You might say my comments were inflammatory, but I was merely commenting on how Microsoft framed their contribution, and pointing out that it seemed inappropriate.

  Tim O'Reilly [10.27.05 01:33 PM]

BTW, I should say that I'm certainly regretting opening my piece with a reaction to the Microsoft PR framing of their donation. The point I really wanted to make was the size of the "twilight zone" region. I apologize if I led off with what now appears to have acted as a red herring.

  Liz Lawley [10.27.05 03:21 PM]

I think the reaction comes from your phrasing: "This PR positioning makes me think that the OCA, a worthwhile effort (to which O'Reilly has contributed content), is being hijacked by Microsoft as a way of undermining Google." I interpreted that as your saying that the OCA itself was being hijacked by Microsoft.

I'd also argue that Microsoft's PR battle right now is very much focused on their major search competitor, so it's not surprising to see them specifically mentioning Google--or the risk lawsuits, which are an ever-present spectre in this organization. :)

  Mike Perry [10.27.05 10:22 PM]

The problem with the 60% of books that are in copyright but out of print isn't one that Google should solve by corporate dictate. That's where they're in the wrong and for once Microsoft is right. An opt-out policy is too heavy a burden for copyright holders to bear. Whether Google stays within fair use or not, others won't and no author or publisher should have to search the entire web every month or two, looking at hundreds of hits for yet another scheme to opt out of.

The answer needs to be legislative. Congress could amend copyright law to return more closely to the old idea that a copyright holder has to keep a registration active to retain their full legal rights. This should not be the nuisance law some have suggested--no paying a small fee every few years for each and every text that needs to be copyrighted. That's a grossly unfair burden on impoverished elderly people who have as much right to retain their copyrights as lawyer-ridden Disney.

It should be a simple registration no more complex than moving one's mailing address. It would include this basic information: "I publish under the name John K. Doe. All my writing was done between 1953 and 1974. I have published these books (listed). I wrote a column called "My Take on Things" in the Daily Post between 1956 and 1964. I live at this address, etc.

That'd get around the 'where do I contact the author' problem. Probate laws would also be amended, so a required boilerplate in everyone's will, author or not, would pass on their literary estate, whether published or unpublished, to someone whose will would pass it on to someone else. It may just happen that someone who never published in their lifetime has a diary that is worth publishing.

That government registration would create an online database that anyone seeking permission could use. Google's excuse for demanding that authors opt-out would disappear. Finding an author would become about as easy as it could be made and certainly cheap enough for Google to afford. Copyright would be treated like it ought to be treated, as something that's registered and tracked by the goverment much like land ownership.

Someone who has registered and kept the address up to date, a simple proceedure, would have the same rights they have today. Someone who doesn't keep their contact information up to date. would only have a "stop press" right. They could stop further posting and publication of their work, but they could not seek royalties or damages from what had already been published. Google and its kin, acting in good faith, would be legally in the clear.

This system would give every copyright holder one place they could go to opt in or out. They wouldn't have to search a thousand Google clones for each and everything they've published in order to opt out. And while Google and their kin would have to do an author search, they'd have only one place they need search to be free of legal liability. If the author isn't there, Google and kin could create a searchable text that could be much more valuable than a few sentences. Congress might even decide they can post the entire text online as long as they are willing to pull it when the copyright holder steps forward. Everyone, author, Google and readers would come out better.

If Goggle put their weight behind this, Congress just might have a law somewhat like that above on the books before these two lawsuits are imperfectly settled, probably after winding through district, appeals and the Supreme Court. (And, even after that, the loser is likely to spend several years creating further uncertainity by calling for the law to be amended.)

Google could even start this process rolling by creating an easy-to-use database that would let authors register their works and tell all who're interested what search and display schemes they'd like to opt in or out of. Google could also do publishers, authors and libraries a service by offering a one-stop place where a description of a book could be posted with a table of contents and cover art. That data, indexed by author, title, subjects and keywords, could then, if the copyright holder agrees, link to both a searchable text and places where the book, new or used, could be purchased.

Cooperation is always better than confrontation. And the legislative arena tends to come up with more even-handed solutions to complex problems than the typical court decision, which is haunted by the specifics of that particular case and tends to be winner-take-all rather than balanced.

Google, Microsoft, the OCA, publishers, author's guilds and the rest should take their eyes off a court fight, which would be based on a copyright law written before the Internet, and get Congress interested in coming up with a solution that is fair to all parties.

Mike Perry, Inkling Books, Seattle

Author of Untangling Tolkien

  Stephanie_B [10.29.05 02:23 PM]

I must say, any print publisher corp that comes out against Google is really acting illogically--probably out of an irrational fear of electronic discourse rather than a sensible view of the situation. It seems to me that Google's efforts is tantamount to free, highly effective advertising for these print books!

  Pierre Sandboge [10.29.05 09:02 PM]

I think Google has made a PR mistake by using an opt out scheme. That makes it appear like the rights owner has som say over fair use. Of course, eventually the courts will have to decide whether it is fair use or not, but assuming it is fair use Google has every right to ignore opt out requests from publishers.

If I where to publish book reviews on my site, I wouldn't have any opt in OR opt out. If someone wanted to have their review removed, I would still consider removing it, of course. But it would be my choice.

As for sharing profits, I think that's absurd, why should publishers (or authors) be compensated for receiving free advertising? AFAIK companies regularly pay to have their products reviewed or mentioned in different media.

What happens when the user hits the twiligth zone? Not everything is served on a silver platter, the user may have to take some action of his own. Actually, often it is not difficult to find the copyright owner of a work, it just takes som effort. If you want to republish the work or use excerpts or whatever, once you know who the copyright owner is, you negotiate like you would with any other use of copyrighted work. If you are just a would be reader, you might be able to track down a second hand copy. Or contact the publisher or author directly. I once bought an out of print book directly from the author. (Rather than destroying unsold copies, the publisher gave them to the author in that instance.) Or presumable you should be able to find the original copy at the library. Or, maybe Google will eventually be able to print you a copy.

  John Beale [10.30.05 01:45 AM]

So in the Google project, why should we care if there are server copies? The purposes for the copies in connection with the Print Library project is to give people access to knowledge about the existence of the book as well as a tiny amount of text. That is of great help to researchers and hopefully to authors and publishers of the books too. It in no way harms copyright owners unless the project becomes something else, namely a full-text service which then is a market substitute.

  Daniel [01.01.07 11:09 PM]

i agree with you

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.