Apr 19

Tim O'Reilly

Tim O'Reilly

State of the Computer Book Market, Part 1

As I wrote last year, computer book sales are a pretty good technology trend indicator. The books people buy say something about the technologies they are trying to learn about, and often tell a story that analysts using more traditional metrics might miss. (For example, I organized the open source summit in 1998 after noticing that all of my 1997 bestsellers had something in common!)

At this point, we have a rich data set to work from. We get weekly point-of-sale data from all major retailers, for all publishers, via Nielsen BookScan's top 10,000 computer books service, load it into a MySQL data mart, and then do analysis and visualization of the data. I periodically do technology alerts based on this information (see 1, 2, and 3 for examples), but it occurred to me that I really ought to do a regular quarterly update on the state of the entire market. Note that this is US data only. I'll report later on the UK market.

The quick headline: the market has turned around, and we're seeing the first sustained computer book upturn since the dot-com collapse in 2001. The figure below shows the overall sales trend from 2003 through the first quarter of of 2006.


As you can see, there's a clear seasonal pattern, with the graphs for each year closely mirroring the year before, with remarkably consistent weekly ups and downs. While our detailed data only goes back to 2003, we know that 2003 was about 20% lower than 2002, which was in turn 20% lower than 2001. 2004 was then 20% lower than 2003...but that represented the low water mark. 2005 saw a slight upturn, and that has strengthened in 2006, with sell through of technical books in Q1 of 2006 up nearly 7% over the same period in 2005.

The devil of course is in the details, and when comparing the performance of thousands of books, it helps to have a good visualization tool. A treemap, originally developed at the University of Maryland, is a great way to look at a huge data set that is organized into hierarchical categories. In this type of visualization, the size of a square shows the magnitude of a category, while the color shows the rate of change. Red is down, and green is up, with the intensity of the color representing the magnitude of the change. (For more background on how we categorize the data and use this visualization, see this previous posting.) Here's a treemap showing gains and losses by category, comparing the first quarter of 2006 with the first quarter of 2005. Click on the thumbnail below to pop up a larger image in another window.

As you can see, the gains are not in all categories. The biggest increases are in the areas of web design and development (up 25%) and digital media applications (up 14%.) Books on consumer operating systems are off by 5%, with books on Windows XP off by a full 17%. Core software development technologies are up 4%. But even there, the gains are not evenly distributed. Java, Perl and C++ are down, C#, Python, and Ruby are up. Red Hat is down, and other Linux distributions are up. In part 2, tomorrow, I'll drill down into the winners and losers by category.

A few other significant tidbits:

  • The concentration of sales in the bestseller list is increasing, with the unit sales of the top 50 books growing at 3x the rate of the market as a whole, the top 100 growing at 2x the rate. By the time you consider the top 500, you're close to the overall rate of increase for the market. This may be a result of retailer concentration -- the independent computer bookstores that used to support deep backlist and niche titles are now mostly gone, while the top titles, mostly concentrated in consumer areas, have always been strongest at the chains.

  • The number of new titles appearing on the BookScan Top 3000 for the first time in Q1 of each year has declined; this may indicate the start of a trend towards fewer titles due to technical publisher consolidation and cutbacks: in 2004, there were 625 new titles; in 2005, 525; in 2006, 510.

Tomorrow: Category Winners and Losers

tags:   | comments: 18   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 18

  Kevin Farnham [04.19.06 07:22 AM]

As always, these graphs are fascinating. When you consider the significant increase in the amount of technology information available on the Internet, and the capability today to do a Google search to find posts offering a solution to something like "g77 /usr/bin/ld: crt1.o: No such file" -- which was not possible with anywhere near the same efficacy in 2000 -- it is all the more remarkable for book sales to be increasing in a year-over-year comparison.

The economic indicators for the U.S. as a whole have been positive for several years. And that "vast" over-capacity of computer/Internet infrastructure that was created in the late 1990s boom appears to have disappeared, as we now hear talk of selling bandwidth to preferred customers -- something you do when there's a shortage of a good, not overcapacity.

If you select 1998 as a starting point, and compare that period's technology to today's, there have been enormous changes: in languages, computing power, Google, public web APIs, blog publishing, social networking, mobile technology. It is an entirely new world today compared with 1998.

Information on how to use and apply the new technologies is indeed needed, and books that you hold in your hands apparently continue to be a viable component in providing the information.


  Tim O'Reilly [04.19.06 08:04 AM]

Kevin, you're absolutely right that the competition from free online sources of information depresses the demand for books, and as a result, the increases we see are perhaps more meaningful than their absolute value.

One way that we see the change is in the mix of books that is sold. Five years ago, our bestsellers were all reference books like Java in a Nutshell, or tutorial/reference books like JavaScript: The Definitive Guide. Today, books like that face much more competition from the net, and our bestsellers are strongly tutorial -- books like Head First Java, Head First HTML and CSS -- or pitched at people a little later on the adoption curve than our normal alpha geek audience, like the Missing Manual series.

  Geof Harries [04.19.06 08:38 AM]

Unless I'm mistaken, what this analysis doesn't take into account is the new breed of PDF only books from smaller publishers, such as the Pragmatic Programmers. They've sold a huge number of "Agile Web Development with Rails" via this method.

  Jerri [04.19.06 08:48 AM]

So, if I'm understanding this right, we're moving away from programmer books and we're moving toward more consumer-oriented books?

From a journalist's perspective, the market has been growing. My business fell sharply during the bust, but over the last two years, it has steadily increased. Not by leaps and bounds, but on a steady climb. I'm now at pre-bust demand levels. The major difference is in pay-rates, which I assume are remaining at a balanced level for cautionary reasons.

Because (most of) what I do is paid for almost exclusively by advertisers, I use it as an indicator for market growth. Companies are buying advertisements, I get to write. Those same companies don't buy as much advertising when there's no growth, so I don't get to write as much.

Back to my main point. If I'm reading your numbers right, then the consumer audience is growing the fastest. People are now ready to learn how to use all of this technology. Very cool!

  Tim O'Reilly [04.19.06 09:01 AM]

Geof - You're right that this data doesn't take into account the sale of PDF editions, or for that matter, book usage through Safari, the online book library joint venture between O'Reilly and Pearson, which now reaches millions of users. For that matter, it doesn't include ad-supported technical sites like those on the O'Reilly Network. Online editions of books, and other forms of online content, are an increasingly important channel for technical publishers.

  Matt Doar [04.19.06 10:13 AM]

The first graph of four years' data seems odd because it wraps around right at the point of maximum change. That is, did the sales at the end of 2003 really plummet by the first week of 2004? The graph might be better plotted from July to July to show the Christmas spike more clearly.

Personally, I'd have to read a good book about Excel to get it to do that :-)


  Tim O'Reilly [04.19.06 10:23 AM]

Yes, the Christmas spike really is that pronounced. The drop at the start of the year is dramatic, every year. Interesting suggestion to do the graphs July to June. But we started getting data in January of 2003, so this is a convenient representation.

One other issue to be aware of is that there are small difference about when to begin and end each year, because weekly data doesn't fit neatly into a consistent calendar. In one year, the opening or closing week of the year might have two days, and in another, four. There's an ISO standard for how to handle this, but for these graphs, we've made a judgment call about which way to break the year makes the patterns stand out most clearly.

  Thomas Lord [04.19.06 10:57 AM]

Thank you sharing! Fascinating!

I was curious where rails came out on this given your recent posing of a question about its mainstream adoption. I've been using it myself, recently, starting from no knowledge, going through the famous O'Reilley on-line tutorials, rather critically to my success using "Four Days on Rails" (available on-line, free, CC attribution-noncommercial-sharealike), and now seem to be doing alright with just the on-line reference manuals.

Along the way to choosing rails I played around with other frameworks. Rails technology was part of the deciding factor but the clincher was the gradient available in on-line documentation (which starts, imo, with the O'Reilley tutorials).

Would I buy a book at this point? Probably not. Well, not about rails per-say. I could sure use a better javascript reference manual than I have and a better CSS manual. I couldn't live without my printed HTML reference manual. All the rails-per-se questions that aren't immediately answered by on-line documentation, though, tend to be best answered by googling around for someone who's posted an example of how to do this or that which isn't built in -- the best of those will eventually be built in and then it's back to on-line reference materials.

All of this is to say, first, that I don't think you can directly infer too much about technology adoption from titles sold. I would wonder how much of the growth in javascript titles is driven by Ruby, for example. (I'd also caution against inferring too much from what CIOs tell you. FOSS solutions have a long history of adoption via wedge strategies so CIOs are typically the last to report (and the first to make official?) that a new technology has gone mainstream.)

And to say, second of all, and just an expression of sympathy/wonder, that it must be freaking hard to derive from what new tech is cool and interesting the answers about which titles people will want as a result (hint in this case: "ruby" or "rails" in the title may not be the critical factor).

And finally to note that, as RMS likes to say, "Free software needs free documentation." My personal pattern for years has been to go as far as I can with free (both senses) on-line materials and then, when certain needs are ossified, shop for the nicely bound, nicely organized reference, recipe, and occaisionally tutorial books. I think I am not alone among hackers in that approach and, as on-line resources improve, I probably won't be alone among other users.

Conversely, on the few occaisions I've gone to the local b&m retailer and proactively bought a bunch of titles I thought I'd likely need in the coming years -- it's always turned out to be an expensive miss. It's telling that many of these books have essentially $0 resale value -- on the day of purchase they are just about instantly the stuff of "$1 for the box of 'em" garage sale stuff.

I think you might be able to drive a lot of sales of publicly licensed materials, even if they are available from other sources that do printing, just by (a) continuting with high quality production values, (b) continuing with highly effective editorial choices, (c) having more content on-line for free -- becoming more of a reference-site-of-choice, (d) "taxing" publicly licensed titles to pay (near) customary royalties to their authors or projects or designees.

I wonder if you could partner with Amazon and similar to link an on-demand printing deal, and specialized-content sales forum, to their inventory mgt. and distribution pipeline. Is anyone doing that yet (he naively asks :-)?


  seesunshine [04.19.06 12:28 PM]

very interesting. The seasonal pattern is so clear. Well, the year-over-year pattern is also noticeable. Would be mor interestign to see some patterns of specific fields year by year

  Alex Diablon [04.19.06 12:56 PM]


Would it be possible for you to give us access to your data and treemap program, so we can analyze the data as well? We might see trends you don't see.

  Tim O'Reilly [04.19.06 02:21 PM]

Alex, unfortunately not. While the data warehouse architecture and visualization tools are ours, the underlying data is Neilsen's, and so our ability to open it up to the world is limited.

  Jared Smith [04.19.06 03:28 PM]

I'm interested in the VoIP books and how they're doing... obviously there's not a lot of historical data for them, but overall, how do you feel they're doing?

(I should note that I'm one of the authors of Asterisk: The Future of Telephony, so this is a very biased question.)

  Anonymous [04.19.06 05:48 PM]

Interesting: Quicken is up 28%.

Tim, one idea you might consider is signing a book along the lines of "Effective Quicken." As I've learned over the past ten or so years, it's one thing to understand the basics of Quicken--and there are way too many Basic Quicken for Dummies clones out there--but it's another thing to use it effectively to build up personal wealth (e.g., put into practice all that good advice you read about monthly in Consumer Reports Money Advisor.)

I'm also curious what the cropped "Adobe" near the upper right is, since it seems like Adobe got hit in everything but Photoshop and one version (which one?) of Premiere.

  Tim O'Reilly [04.19.06 08:51 PM]

Anonymous -- I assume you mean the bright green Adobe square right next to Maya on the extreme right of the treemap. That's for books on the entire CS suite (including Photoshop.) It was up 128%. The bright green rectangle above that, which says "Camera", is "Camera Raw", which is up 115%. Next to that is "Digital Photography, General", up 59%.

Tomorrow's posting has more detailed treemaps for each of these areas.

  robin [04.20.06 04:15 AM]

Dear tim

I am a chinese reader of your blog.I think your thoughts a bout web2.0 is most correct in all kind of ideas.why don't you write a book?

By the way,can you tell me any books about web2.0 bussiness model?


  conan the librarian [05.02.06 08:37 AM]

i realize that no one has commented on this post in several weeks, so this won't get answered, but i have to write it anyway...

tim, do you have any numbers like this for safari and/or any other "online bookshelf" type services? i'd like to see what the trends were like there as well.

  Tim O'Reilly [05.02.06 09:23 AM]

Conan --

We do have figures for Safari, and I'll put it on my list to do some postings on Safari data soon. In fact, I've got one in the works about "the long tail" on Safari. In terms of technology categories, one thing that we definitely see is that the Safari user base skews towards Java far more heavily than Bookscan.

P.S. I receive email notification of all comments, even on old posts, so I do see them all and respond when I can.

  Helen, design manager [05.18.06 03:32 AM]

I still think that the technology trend indicator should be a compound index. We should be able to see the demand of this or that technology from different aspects of its usage.

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.