Oops - Only 4% of Titles Are Being Commercially Exploited

In a recent post, I made the assertion that 10-20% of titles published were still in print and being commercially exploited, with another 20% clearly in the public domain, leaving approximately 60% in what I called “the twilight zone” — with no clear rights. Farhad Manjoo of Salon, who is writing a followup story, emailed me for confirmation of those numbers, and in so doing, made me realize an error in my calculations. I had taken the number supplied by the OCLC, of 10.5 million unique titles in the five libraries cooperating with the Google Print Library Project, and applied to that the recent report by Nielsen Bookscan that 1.2 million unique titles sold at least one copy in 2004, and came up with the estimate of 12% I used in that prior post, which I generously expanded to 10-20% by assuming that books that didn’t sell even one copy might still be considered “active” by some publishers.

However, in answering Farhad’s question, I realized that the correct number to work with is not the 10.5 million unique titles in the five libraries working with Google, but the 32 million unique titles in the entire OCLC “WorldCat”, which represents their best estimate of the number of titles held in all US libraries, and is a reasonable proxy for the number of books published. That math leads to the following revision of the picture I published earlier:

Or, expressed as a pie chart:

Put another way, if the AAP and Author’s Guild were to prevail, and opt-in were required, the AAP is asking us to believe that publishers are willing to unearth the contracts for more than 25 million books, track down the authors (since many of those books surely don’t grant electronic rights to the publishers, since those rights weren’t even conceived at the time many of those contracts were signed), and get their permission to opt them in, and this despite the fact that those 25 million books didn’t sell even one copy in 2004. Try to be serious. There is no economic incentive for publishers to opt in books in what I’ve called “the twilight zone.” This approach will make the creation of a comprehensive search engine for books virtually impossible.

These numbers are corroborated by a conversation I had today with the Copyright Clearance Center. The AAP has asserted that the CCC can help track down the rights to all the orphaned works in an opt-in scenario. Yet the CCC has rights records for only 1.7 million titles, including journals, and many foreign language works. So again, there’s not much traction on the titles in the twilight zone.

A bit more math: in Microsoft’s announcement that they were joining the Open Content Alliance, they estimated that their $5 million dollar contribution would be enough to digitize approximately 150,000 books. By that measure, it will cost Google over $300 million to digitize the 10.5 million unique titles in the collections of the five participating libraries. Of that, assuming these numbers hold, more than $200 million will be spent to digitize works that the publishers have determined have no current economic value. Are the publishers going to make this investment to digitize works they no longer exploit in print? Not very likely.

In short, Google is offering the publishers and authors a $200 million dollar gift horse. Based on our experience with Safari, I’m confident that search helps people to find and use books that aren’t available in print. A preliminary study comparing Safari usage to BookScan data on books sold showed us that 23% of all Safari usage came from books that represented only 6% of print sales. (I’ll be blogging this data in more detail soon.) When people can search books, they will discover forgotten gems — to which authors and publishers can then assert their rights, perhaps bringing them back into print, laying claim to a share of the advertising revenue, or enabling click through to an electronic copy. But for that dream to happen, someone has to make the investment to create the search index.

And as to the publishers’ claim that Google’s intended use isn’t fair, I’ll point out that it’s exactly the same fair use exception that Google and other search engines use to create an index of web pages, which are also copyrighted material that, by the publishers’ interpretation of the law, should have required opt in by web publishers. Making intermediate electronic copies in order to create derived works that are themselves fair use is one of those technological changes that requires us to rethink the narrow interpretation of copyright law that old-line companies would have us hold onto.

In short, I believe that the AAP’s position is intellectually dishonest. They are pretending that opt-in is a real solution to the orphaned works problem, when by the numbers, it clearly is not. And at the same time they are resorting to scare tactics, calling the project “a license to steal”, when in fact the proper analogue is to what Google already does on the web — and we all know how much value that has created. As Cory Doctorow likes to point out, Google should no more be negotiating with publishers over this extension of fair use than Sony should have negotiated with the movie studios before introducing the VCR. New technology always brings challenges, but it also brings opportunities. If we allow old line industries to suppress new opportunities in the interest of protecting their entrenched businesses (and I should note that the five large publishers who dominate the AAP control more than 50% of all books sold), we will all be the poorer for it.

Oops – Only 4% of Titles Are Being Commercially Exploited