Jun 1

Tim O'Reilly

Tim O'Reilly

Free Downloads vs. Sales: A Publishing Case Study

Asterisk book coverAs part of our continued effort to understand the impact on book sales of the availability of free downloads, I wanted to share some data on downloads versus sales of the book Asterisk: The Future of Telephony, by Leif Madsen, Jared Smith, and Jim Van Meggelen, which was released for free download under a Creative Commons license.

Jeremy McNamara of, which operates one of the mirrors, provided us with download stats, which we were then able to compare with book sales. Our goal of course, is to help publishers understand whether free downloads help or hurt sales. The quick answer from this experiment is that we saw no definitive correlation, but there is little sign that the free downloads hurt sales. More than 180,000 copies were downloaded from Jeremy's mirror (which is one of five!), yet the book has still been quite successful, selling almost 19,000 copies in a year and a half. This is quite good for a technical book these days -- the book comes in at #23 on our lifetime-to-date sales list for the "class of 2005" (books published in 2005) despite being released at the end of September. You might argue that the book would have done even better without the downloads, especially given the success of asterisk and the importance of VoIP. But it's also the case that the book is far and away the bestseller in the category, far outperforming books on the same subject from other publishers.

Meanwhile, we saw a huge spike in downloads starting at the beginning of this year, but didn't see a corresponding drop in print book sales, other than the continued slow erosion that's typical of books in print (especially one that's heading towards a second edition.) However, we did see the book's first fall from grace, dropping from an average run rate of about a thousand copies a month to about six hundred back in March 2006 coming at about the same time that we start showing the free downloads, but we're not sure whether or not that is just because we don't have earlier download data -- we believe that the book was available online sooner after publication even though Jeremy didn't start his mirror till March. (Next time we do a book available for free download, we'll be careful to collect accurate data from the start of the project.)

In any case, this kind of sales drop is not completely inconsistent with the sales pattern from many other books. And for authors who want to reach the widest audience, it's certainly possible that even if free downloads did shave a percentage from sales, the tradeoff is worth it (see Piracy is Progressive Taxation).

Here's the graph comparing downloads to print book sales:


Because the scale of the free downloads is so much greater than the scale of the book sales, the data is plotted on a two-axis graph. The y-axis on the left shows the book sales numbers, while the y-axis on the right shows the download numbers. As you can see, the book peaked in its sales shortly after its release and then started on a gradual downward trend. Unfortunately, we don't have download data from the very start of the project, but only when Jeremy's mirror started seven months later. But what's most striking (apart from the huge scale mismatch, in terms of the number of people accessing the content through the free online version), is that when the downloads spiked in January of this year from about 8000 a month to nearly 30,000 after the book's free availability was noted on digg, we didn't see a correspondingly sharp decline in sales. Of course, neither did we see any evidence that free availability of the book spurred sales. And as noted above, there is a sharp drop at about the time the download data starts that is likely unrelated to the downloads, even though we can't entirely rule out the possibility that downloads had some effect.

Keep reading for a few more details, plus graphs showing the relationship between book sell-in and sell-through, and the sales pattern for a comparable non-free book.

A few notes on the data:

  • Book sales data is taken from Bookscan. Bookscan reports data weekly, and as you all know, months don't end on neat weekly boundaries, while Jeremy's download data is on a monthly calendar. I've chosen the closest week end, so some months have five weeks, while others have four -- that is one reason why the book data spikes up and down.
  • Bookscan claims to report about 70% of US book sales. We estimate that this represents about 50% of worldwide English-language sales. As a result, I doubled the reported Bookscan numbers for purposes of this graph. The result is consistent with the inventory data. We've sold in about 19,000 copies, net of returns. Doubling Bookscan sell-through yields about 17,000 sold through, which would suggest that there are a few thousand left in inventory in bookstores. (Our actual data shows fewer than a thousand, so the right multiplier might be 2.1, but it's close enough.)
  • In the paragraph above, I referred to sell-in and sell-through. A reminder: Bookstores typically stock up when a book is first released, and then sell down that inventory over time. The publisher's sales to the bookstores are "sell-in", the numbers reported by Bookscan are the bookstores' sales to end customers, or sell-through. Here's a graph comparing sell-in of the book to its sell-through:
  • asterisksellin.png

As you can see, the initial sell-in was around 5000 copies. After the initial sales pattern was established, bookstores order just enough to keep up with demand. However, this can sometimes be a self-fulfilling prophecy. If there aren't enough copies on the shelf, they can't be discovered by potential readers browsing the store. That may well be one reason for the sales decline. Bookstores aren't carrying many copies of this book for people to discover. It's got a negligible return rate, yet there are very few copies in bookstores -- we show only 200 for all of Barnes & Noble's hundreds of stores, and 400 for all Borders stores. (Amazon is pretty much just-in-time, and doesn't need to carry much inventory.)

Finally, I wanted to show the Bookscan trend graph for another O'Reilly book released about the same time, Understanding the Linux Kernel. You can see the same early spike in sales, and the same long, gradual decline. (I will admit that the decline in this book has been more gradual, and it's achieved a bit more of a steady state than the asterisk book.)


P.S. If this kind of information floats your boat, TOC is the place to be. We're obviously very involved in the changes the internet is bringing to publishing, and are bringing together people who are driving those changes, rather than just waiting for them.

tags: publishing  | comments: 21   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 21

  Ed Renehan [06.01.07 08:58 AM]

This is all very interesting, even given the missing data. I think you are right to want to do a more rigorously controlled test with another book where you track ALL downloads, side-by-side with bookstore sell-through, right out of the gate (also making sure the download option is properly publicized through DIGG, etc. right out of the gate). Depending on what THAT test reveals, a subsequent variation on the theme might of course be to make only the first several chapters of a volume available for free download, as a mode of publicizing the book and inspiring sales. Best, - Ed R.

  Prasad [06.01.07 12:07 PM]

It would be interesting to find out the % of people who bought the book and also downloaded the book (and the other way around) and the % of people who downloaded the book (but did not buy) and actually read the book.

  michael [06.01.07 03:28 PM]

The graphs seem to be cut-off on the right hand side and incomplete ...

Very interested in this kind of data as we are a small publisher looking at this exact issue.

Thanks Tim.

  John Davis [06.01.07 05:39 PM]

This bears out my own feelings on the matter. Personally, if I'm going to put some serious time in reading, I wouldn't want to do it on the computer screen. I'd rather have hard copy in my hands. It's just easier to see, apart from anything else. The level of contrast with black ink on white paper is far greater than the equivalent in pixels on a computer screen.

I would suggest that more ebooks being given out free would increase book sales enormously. It's the equivalent of playing a song on the radio or TV - it gets it out and those that want to have it buy it.

John Davis

  Lino Ramirez [06.01.07 09:06 PM]

Hi Tim,

Very interesting exercise the one you showed us. I believe this is a very good starting point to better understand whether or not offering free downloads would hurt sales. To do a more rigorous analysis, you would need to collect additional information, for example:

  1. Number of persons that downloaded the book and later on decided to buy it.
  2. Number of persons that downloaded the book and later on decided not to buy it.
  3. Number of persons that bought the book and later on decided to download it.

For those persons in 1., it would be interesting to know how long it took after downloading book before they decided to buy the book.

Another experiments you could do to analyze your data involve:

  1. Gathering the data you have available for book sales of books for which a free download is not available
  2. Grouping the books in a predetermined number of categories
  3. Finding the typical book sales trends for each individual category
  4. For a new book (which is going to be available as a free download), delaying the offering of the free download for enough time to see how well the book is following the book sales trend for its category (I guess that between 4 and 6 months should be enough).
  5. Offering the free download and trying to determine the difference between the typical trend and the new book sales. Please note that you should also take into account other factors that could affect the book sales (for example, the appearance of new competitors books, the appearance of a new technology that made the book obsolete, etc. )

By the way, at OSCON, I will be giving a talk on Machine Learning Made Easy with Perl and I would love to talk with you about the results you get. Moreover, if it is OK with you, I would love to have a look at some of the data you have to do some analysis and present the results as a case study during my session.

I am looking forward to meeting you at OSCON



  Kurien [06.02.07 01:01 AM]

Does every point of sale inform the potential buyer that the particular book is available for free download? I think this could have an impact on sales.

  E Tompkins [06.02.07 01:51 AM]

If you attempt to track if downloads lead to sales or no sales, you should be asking for feedback on the decision from people. WHY buy/no buy, used as a quick reference, skimming for interesting bits only, fix one or two sticky problems?
Who What Where When How Why.

  Ed Renehan [06.02.07 03:41 AM]

Much of the data Kurien and E. Tompkins suggest obtaining would be great to have, but will prove impossible to acquire without spending a vast chunk of change on a complex test that includes both quantitative and qualitative (focus group, point-of-purchase interview) aspects. In the study at hand, the KISS philosophy should probably rule. In broad strokes, the question is basic: Whether free e-book downloads (a) hurt, (b) do not impact, or (c) enhance the sales of the analog item. I think a straight-up and inexpensive quantitative comparison of ALL downloads vs. all bookstore sell-through (as well, factoring in any direct sales you might make as publisher, along with sales made by authors) on a particular test title will go a long way toward answering that question. (Remember, however, that any "answer" derived would, of course, be something of a judgment call on your part, albeit an informed judgment call, because you will invariably have at least one "soft number" in your analysis. The anticipated likely standard sell-through for whatever title you choose next to test [the number of analog units you'd expect to move of a given title without the test, and without free-downloads] will invariably be a "soft" figure derived, though not arbitrarily, by you and your staff. It will be based on your long experience and analysis of past sales of similar books; it will spring from the same logic by which you have long derived print & bind quantities. But it will still, from a technical standpoint, be a "soft" number.) All best, - Ed R.

  Ed Renehan [06.02.07 03:57 AM]

Sorry: On the above, - I was actually referencing the data requests made by Lino Ramirez and E. Tompkins, not Kurien. Apologies for confusion.

  Matthew Lock [06.02.07 03:58 AM]

I wonder how well a book would do if it was online as HTML supported by Google Adsense? Would it be more or less profitable to the writer/publisher than a print version of the book?

  Roy Schestowitz [06.02.07 04:51 AM]

These are very, very interesting (and invaluable) charts. Whether this can be generalised to the case of software distribution is another intriguing question.

  E Tompkins [06.02.07 10:59 AM]

Put a survey link on the download page.
On the survey ask which books they have bought in the past, if any, that are also available as a free download.
Put 3 to 6 questions on 1 page to fill out with multiple choice regarding purchases and the usefulness of the books, and a section with each question for comments about the particular book.
Email address optional.
Let people know about the survey.

  sabadashus [06.02.07 01:19 PM]

I would discount most of what's been said so far in simple markets. The missing element is the 'expectation' of the market - maybe that should be plural: expectations.

The expectation of a free copy, or access to one, could be quite chilling, much more than the reality. And of course, if there were no expectation of free copies, downloaded copies, online copies or whatever, there couldn't be any effect on sales.

ps: but nonetheless, the discussion is useful and stimulating. By developing a community, however small, of readers who expect a dialog with and about a publication, you might be creating a new environment, in which expectations (whether of price or access) affect behavior in a more complex way.

  Ed Renehan [06.02.07 02:25 PM]

I'm sorry but survey results from a self-selected (whomever chooses to answer) universe are, by definition, invalid so far as extrapolating assumptions one can apply to the broader market. I suppose you could make the survey a prerequisite to the download, but this will influence the # of people taking the time to do the download and corrupt your more-pure result: the simple counting of actual downloads. Also, will responders necessarily know which of the numerous books they've purchased in the past were available as free downloads?

  Edwin [06.02.07 02:26 PM]

Very interesting post indeed!

The figures of downloads are only based on the mirrors though. But how about the rest of the network? The long tail of filesharing?

Look at diggtorrents for example:, not to mention all the p2p-sites.

I dare not estimate how many downloads that are but u might think of double figures perhaps...

But a quick check told me that several Dutch libraries have the book in their collections..and indeed. An e-book is nice to browse and comes in handy when u need to quote and all that. But for real good reading you need it printed.

On your pillow.

  Mikael [06.03.07 06:40 AM]

So when will O'Reilly release their books online?
I believe O'reilly should've done this a loooooooong time ago.

But it's your business, you can do what you want.

  Matthew [06.04.07 04:07 AM]

It is still too early to judge. 'Max' a novel by Juval Aviv published by Random House in Britain is now being released as a free podcast in full -- see It will be fascinating to see if that lifts sales of the book.

  Ed Renehan [06.04.07 04:54 AM]

Re: novels, etc. My hunch is that different market niches might well show different results. What applies as a correlation between free downloads (and/or podcasts) and sales of analog books vis-a-vis commercial nonfiction/fiction (popular biographies, novels, etc.) may well not be what applies as a correlation vis-a-vis professional/technical books. We are talking about two very different market arenas: one a vast and relatively unfocused market driven by a soft need for competitively-priced diversion and entertainment, the other a more compact and specific market, with far more elasticity in pricing, driven by a hard, immediate need for cutting-edge business information.

  Jim Van Meggelen [06.04.07 07:21 AM]

Many people that I meet tell me they bought a print copy in part to thank us for releasing it under the CC license. I'm not sure how to translate that into a meaningful statistic, but I hear that very often.

  Colin Day [06.12.07 09:07 PM]

I think the delayed launch of the free download version is actually helpful for assessing the impact of the online version and provides a simple method of assessing its impact on sales.

The initial sales are unaffected by the availability of the free downloads.

Most books follow a downward path month by month after the spike on publication. The rates of decay for books of similar type and readership are similar. So if you identify some comparable books, their sales decay rates can be applied to the initial sales of this book to generate a prediction how it would have sold had it gone on without online competition.

The difference between that predicted path and the actual indicates the sales impact of the online version. Eyeballing the data, I would suspect the difference will prove to be small.

  Anonymous [07.30.07 05:11 PM]

What I would like to know is How A Book's location on the bookstore's bookshelves impacts sales?
This is why I ask.
When my book 'The Tornado Struck at Midnight' was published by Publish America, I soon discovered no bookstores would stock it because Publish America books were 'non returnable'. As a local author, I was able to get my local "Border's" to carry my book and the book vanished from the shelves almost overnight, but with only one store carrying my book it hardly made a dent in my royalty check and eventually it became too tedious to keep checking my local Borders to remind the manager it was time to re-stock my book. Recently I realized the book didn't sell as rapidly on the various Internet outlets as it did on the Borders bookshelves and I was curious as to why this was so? I soon realized that my name is Greayer and my book is filed between Grafton and Grisham on the Borders bookshelves. A large number of browsers 'happened upon' my book, and the attractive cover caused them to browse the contents. Once they did, they were 'hooked' and they bought my book. It doesn't take a genius to realize that my book would have been a 'Bestseller' had it been stocked on the shelves of more than one bookstore.
I also realize that Borders has no incentive to stock my book in many bookstores just to help me prove my point, but back in 2005 Publish America sent me an email with the enticing subject:
Sent: Wednesday, September 14, 2005 10:49 AM
Subject: PublishAmerica Makes All Books Returnable.
If that is so, apparently it wouldn't cost you a dime to discover a NYTimes bestseller. The only reward I can offer is, 'it's a story you could tell your Grandchildren.'
What I'm wondering is: Do books shelved near ‘famous authors' sell more books? Thanks for listening.

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.