Wed

Mar 5
2008

Mike Hendrickson

Mike Hendrickson

State of the Computer Book Market, Part 4 - The Languages

Note: An inadvertent draft of this post went out in our RSS feed and was posted for about an hour on Tuesday. It was cloned from Q1 '07 and most of the data and information was wrong.

In this fourth post (one, two and three are found here) on the State of the Computer Book Market, we will look at programming languages and drill in a little on each language area.

Overall the 2007 market for programming languages was down (1.67%) in 2007 when compared with 2006. There were 1,809,695 units sold in 2006 versus 1,779,523 units sold in 2007 which is (30,172) fewer units in 2007. So the modest 1% growth in the Overall Computer Book Market must have been fueled by non-programming oriented books. You don't need a programming language to learn to use MacOsX, Vista or Office and that is where the growth was in 2007.

Before we begin to drill in on the languages, we thought it would be best to explain our "language dimension." We track both topic categories and languages. In the latter dimension, we capture whether the book being categorized has code examples in a particular language. So a title like Head First Design Patterns, which contains all examples written in Java, carries the "java" tag on the language dimension even though it's in the design patterns category.


A Treemap view of Programming Languages

Caveat: The image below is about 1% or 2% off because there were several titles that were unclassified at the end of 2007 when we took this screen capture. The trends are still the same, but the precision is lacking.

Language_all.jpg

In the treemap view, you'll notice a couple of bright green areas -- namely ActionScript and PowerShell. Python, Ruby and C# are green (not bright green), showing nice growth when compared to 2006. Ruby was a small box last year and is now the 8th largest language, passing Perl and Python and now knocking on the door for Visual Basic's spot. Ruby has the second largest unit growth after C#, going from 4% overall market share to 5%; it is now 4k units from displacing VB for #7 overall. C# was equally impressive with a 36,811 unit growth, or 18.85%; it went from 11% market share in 2006 to 13% market share in 2007. At the rate it is going, it should surpass Java as the number one language this year as it is only (9,526) units short and is on a positive 18.85% growth rate while Java continues its slide at a (14.16%) clip. In addition, look at the title efficiency: C# is achieving close to the same number of units with 3/5 the number of titles, indicating that each title is selling better.

Now let's dive into this treemap and take a closer look at the languages. This chart shows the five-year trend for the major programming languages. Red is used to highlight 2007. What is very noticeable is the huge decline with Java. In five years, it is now half of where it was in 2003. C# and Javascript are the only two Major languages that showed fairly conistent performance during the past five years.

2007 Market Share

five_year_trend_lang.png

Before we dive in, let's look at the high level picture for the grouping of languages. As you can see in the table below, the MidMinor and Minor languages experienced growth in 2007 while the rest experienced a decline. The languages driving the growth in the Minor category are Groovy, SAS, Erlang, Matlab, and Processing. For the sake of grouping and presenting this information in a more readable format, we have classified the categories for the languages in this way:

Category Category Unit Range 2006 Units 2007 Units Growth
Major 100,000-250,000 1,183,713 1,173,444 (10,269)
Mid-Major 65,000 - 99,999 453,596 441,739 (11,857)
Mid-Minor 10,000 - 64,999 151,633 152,890 1,257
Minor 1000 - 9,999 63,084 77,482 14,398
Immaterial 1 - 999 5,825 4,392 (1,433)

The tables that we are showing below will contain the following header:

*Major* U N I T S T I T L E S M A R K E T S H A R E
1. Language 2. 2006
Units
3. 2007
Units
4. 2006
Titles
5. 2007
Titles
6. 06Mkt
Share
7. 07Mkt
Share

  1. Name or short name of the language
  2. Units sold in 2006
  3. Units sold in 2007
  4. Number of Titles making Bookscan 3000 in 2006
  5. Number of Titles making Bookscan 3000 in 2007
  6. 2006 Market Share
  7. 2007 Market Share

The following table contains data for the Major languages. As you can see, C#, Javascript and .net Languages (books containing both C# and VB code) were the only languages experiencing growth. At the rate Java is declining, C# should overtake Java this year as the number one most purchased language category for books. We are a bit surprised that PHP is down. Maybe Javascript or Ruby/Rails are biting into this market.


Major Programming Languages -- >100,000 units in 2007

*Major* U N I T S T I T L E S M A R K E T S H A R E
Language 2006
Units
2007
Units
2006
Titles
2007
Titles
06Mkt
Share
07Mkt
Share
java 281,502 241,628 326 306 16% 14%
c# 195,291 232,102 170 179 11% 13%
php 194,722 158,538 95 103 11% 9%
javascript 185,031 203,225 82 117 10% 11%
c/c++ 180,713 167,344 245 238 10% 9%
.net languages 105,872 107,077 96 88 6% 6%

Here are the top titles for the Major languages, and incidentally, the titles and order are the same whether you look at Units sold or Dollars generated:

O'Reilly Head First Design Patterns
Peachpit JavaScript and Ajax for the Web
Peachpit CSS, DHTML, and Ajax
Sams Sams Teach Yourself PHP, MySQL and Apache All in One
O'Reilly JavaScript: The Definitive Guide


Mid-Major Programming Languages -- 65,000-99,000 units in 2007

You'll notice in the Mid-Major languages that Ruby and Actionscript are the two languages that are showing growth when you compare 2006 and 2007. Both ActionScript and Ruby achieved their growth by adding more titles to the list. Remember that does not mean that more titles were published, but that more made it into our Top 3000 report from Bookscan.

*Mid-Major* U N I T S T I T L E S M A R K E T S H A R E
Language 2006
Units
2007
Units
2006
Titles
2007
Titles
06Mkt
Share
07Mkt
Share
visual basic 147,710 99,964 152 127 8% 6%
ruby 67,664 95,731 17 40 4% 5%
sql 92,981 89,289 71 82 5% 5%
actionscript 66,568 85,971 33 41 4% 5%
vba 78,565 67,097 53 61 4% 4%

Here are the top titles for the mid-Major languages.

Pragmatic Agile Web Development with Rails
O'Reilly ActionScript 3.0 Cookbook
Sams Sams Teach Yourself SQL in 10 Minutes
O'Reilly SQL Pocket Guide
Pragmatic Programming Ruby
O'Reilly Essential ActionScript 3.0


Mid-Minor Programming Languages -- 10,000 - 64,999 units in 2007

So the news in this category is that Python has swapped places with Perl as the leader of the category. Perl had seven fewer titles make it into the Top 3000 while Python saw and additional 8 make the list. Powershell came out of nowhere and surpassed all the other groupings to make this list.

*Mid-Minor* U N I T S T I T L E S M A R K E T S H A R E
Language 2006
Units
2007
Units
2006
Titles
2007
Titles
06Mkt
Share
07Mkt
Share
python 38,609 46,028 33 41 2% 3%
perl 50,483 37,984 50 43 2% 3%
transact sql 17,756 21,341 17 16 1% 1%
vbscript 22,976 18,167 17 16 1% 1%
powershell 1,377 13,961 1 9 0% 1%
shell script 14,466 11,479 13 12 1% 1%

Here are the top titles for the mid-Minor languages.

O'Reilly Learning Python
O'Reilly Learning Perl
Manning Windows PowerShell in Action
Microsoft Press Inside Microsoft SQL Server 2005: T-SQL Querying
O'Reilly Programming Perl


Minor Programming Languages -- 1,000 - 9,999 units in 2007

So the news in this category is that Groovy came out of nowhere and moved quickly up the charts. SAS, and MATlab had nice growth. Erlang, Processing and Nxt-G had no units in 2006 and each sold a nice quantity in 2007. Remeber from above, this is the language grouping that grew the most in 2007. More than 14k units were produced by this grouping in 2007 versus 2006.

*Minor* U N I T S T I T L E S M A R K E T S H A R E
Language 2006
Units
2007
Units
2006
Titles
2007
Titles
06Mkt
Share
07Mkt
Share
basic 10,660 9,374 10 7 1% 1%
pl/sql 8,780 7,295 18 18 0% 0%
sas 2,898 6,298 15 18 0% 0%
objective c 5,384 5,509 6 6 0% 0%
groovy 210 4,791 2 3 0% 0%
matlab 2,565 4,602 10 15 0% 0%
assembly 4,727 3,762 14 13 0% 0%
applescript 3,590 3,012 8 6 0% 0%
mdx 3,428 2,743 6 3 0% 0%
latex 2,827 2,718 4 6 0% 0%
erlang 538 624 1 2 0% 0%
awk 3,031 2,572 3 2 0% 0%
mel 1,204 2,386 4 4 0% 0%
lua 1,563 2,367 4 3 0% 0%
cs2 335 2,259 1 1 0% 0%
processing - 1,991 0 3 0% 0%
nxt-g - 1,659 0 1 0% 0%
lisp 2,085 1,593 7 5 0% 0%
tcl 2,052 1,588 4 5 0% 0%
scheme 1,199 1,271 5 7 0% 0%
haskell 416 1,268 2 4 0% 0%
abap 2,055 1,188 2 2 0% 0%
mysql spl 1,197 1,176 1 1 0% 0%
vhdl 847 1,010 6 9 0% 0%

Here are the top titles for the Minor languages.

For Dummies Beginning Programming for Dummies
Addison Wesley Cocoa
Pragmatic Programming Erlang: Software for a Concurrent World
Manning Groovy in Action
Osborne/McGraw-Hill Oracle Database 10g PL/SQL Programming

Immaterial Programming Languages -- 1-999 units in 2007

The following languages all sold between 1 and 999 units in Q1 '07. These are what I am considering the Immaterial programming languages.

*Immaterial* U N I T S T I T L E S M A R K E T S H A R E
Language 2006
Units
2007
Units
2006
Titles
2007
Titles
06Mkt
Share
07Mkt
Share
rpg 804 755 4 3 0% 0%
alice 64 751 1 2 0% 0%
f# - 698 2 2 0% 0%
cobol 714 620 3 6 0% 0%
directx 1,854 606 1 1 0% 0%
dsl - 262 1 1 0% 0%
delphi 586 126 3 3 0% 0%
jcl 109 83 1 1 0% 0%
idl 20 73 1 1 0% 0%
realbasic 814 73 2 2 0% 0%
ada 67 71 4 4 0% 0%
cl - 54 - 1 0% 0%
fortran 49 50 2 3 0% 0%
ocaml 169 38 1 - 0% 0%
e - 33 - 1 0% 0%
javafx - 29 - 1 0% 0%
awd 11 23 1 1 0% 0%
m 13 10 1 1 0% 0%
maxscript - 9 - 1 0% 0%

The noticeable trend with the Immaterial languages is the decline of DirectX and positive growth of F# and Alice.

Here are the top titles for the Immaterial languages.

Sams Managed DirectX 9 Kick Start: Graphics and Game Programming
Mc Press The Modern RPG IV Language
APress Foundations of F#
Sams Sams Teach Yourself COBOL in 24 Hours
Prentice Hall Learning to Program with Alice

Lastly, the following languages were Inactive for 2007. That means they did not sell enough units to make it into any weekly Top 3000 report. And here is the list: labview, lingo, ml, mumps, net languages, oopic, opl, pascal, pda languages, pl/1, qbasic, rexx, s, smalltalk, spark, squeak, unrealscript, windows script.

So this concludes the Languages view of the State of the Computer Book Market. We hope you enjoyed it. Pay attention to this space, as we will be publishing this information twice a year. Now that we have all the queries, spreadsheets, pivot-tables and systems down, we should be able to update these posts much more easily going forward. If you have anything you would like explored a bit more thoroughly, please leave a comment here and we will see what we can do.


 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/6363

Comments: 24

  Alex Tolley [03.05.08 11:00 AM]

Do you have enough historical data to try to predict what the "take off" point is for a language? The classic growth in bookshelf space is a fine anecdotal way to see if there is going to be great demand for the language, but this is almost too late to be of use. late last year, I did notice the almost explosive bookshelf space growth for "Ajax" titles at Barnes and Noble, which made sense. Ruby and Python seem to be stagnating, despite title and volume growth, and it is very noticeable that Java is declining by this measure.

  Theo [03.05.08 11:07 AM]

A difficult language needs a lot of books to master...

  Jeffrey McManus [03.05.08 11:47 AM]

You guys have got to quit characterizing one language or another as the "largest language" based solely on book sales. If a bunch of books on a given language are sold in a given year, that doesn't make it the "largest language". It makes it the largest book market. There's a huge difference.

Looking solely at book sales to determine the "size of a language" will always favor new languages that people are learning at the expense of old languages that people generally know well (cf. Ruby vs. Visual Basic). But that doesn't mean that Ruby is "bigger" necessarily.

Also, book sales as a yardstick for language adoption will be skewed by the relative depth and complexity of the language. So this means that Python, for example, could appear to be less signficant than, say, Java.

Finally, the book market in general ain't what it used to be, as I'm sure you know -- it's no longer a prerequisite for a developer to purchase a book on a language to learn it.

(My guess is that this dynamic also explains the uptick in Powershell, which has very little adoption and no significant online developer community today but is very complicated to learn. If you sold more Powershell books than any other kind, you couldn't honestly start characterizing it as the "biggest language" by any stretch of the imagination.)

If you're trying to make broad-stroke generalizations about the strengths of various languages, why not do a more scientific survey and publish those results rather than using backwards-looking statistics based on book sales?

  Mike Hendrickson [03.05.08 12:29 PM]

Jeffrey-

I am not sure where I said that X language is the largest language. As the title of the piece says, this is the State of the Computer *Book* Market. And we can say which language sells more books.

This is about book sales and if folks want to extrapolate to other areas, they should obviously do so with caution. I received a few emails about working with others to blend our data sets together to try and make more definitive sense of this all. We will hopefully be adding some more interesting data in the future.

I am curious about one of your comments though about Powershell having very little adoption. What sources explain this? I would love to see how you came to that conclusion for future posts.

Sorry you misunderstood this post as trying to make broad-stroke generalizations based on backwards looking stats. The data from Bookscan is all we are using at this time. As far as generalizations, I am not sure which statements reflect that.

  Alex Tolley [03.05.08 01:17 PM]

Jeffry: "Looking solely at book sales to determine the "size of a language" will always favor new languages that people are learning at the expense of old languages that people generally know well (cf. Ruby vs. Visual Basic). But that doesn't mean that Ruby is "bigger" necessarily."

I think book sales are a very useful proxy for new interest in a language. Technical books are bought to enable learning, so one expects that this reflects the interest in a language from newcomers. As the language matures, the book sales will reflect the size of the market via the entrance of newcomers. So yeah, the book market isn't the language market, but you can use it to make inferences about the language market.

  davidm [03.05.08 01:23 PM]

I agree the author of this column should take much more care to point out they are discussing book sales, not programming language popularity. Language throughout the writeup seems to indicate it is about programming language popularity.

As well, over the years, I've noticed a fascination with extrapolating future results based on "up and coming" languages, always with bias against existing languages, but I haven't noticed much correlation with facts over time.

It would be interesting to compare your observations with commonly accepted indexes such as http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html , I'd suggest that O'Reilly's focus on different languages doesn't particularly map to demand, perhaps there is some internal bias.

  Tim O'Reilly [03.05.08 08:11 PM]

davidm --

"some internal bias?" This isn't our data. This is industry data from the major retail outlets. Nor are the results slanted in any way to areas where O'Reilly has product. E.g. we have no books on Powershell, yet Mike wrote about it as a one of the "hot" areas for books.

This is just one data point. Tiobe (which aggregates web pages found by search engines) is another. Job postings are another.

Integrate across as many data sources as you can.

  Wenjian Yang [03.05.08 08:40 PM]

It's interesting to see SAS and Matlab have such high increase, is R counted together as S or it's not included in the analysis?

  Scott Ruthfield [03.06.08 12:09 AM]

I had actually just started working on a blog post trying to bring together learnings from

--this data - which (despite some of the comments) is best used as an indicator of interest in the language (especially given the logical purchase bias towards beginner's books), rather than a reflection of overall popularity

--the TIOBE data, which has its own quirks, since it really reflects the number of times people have talked about a language online, rather than the number of people programming in the language, and so has some very strange outliers, like Delphi

--job postings on Dice or Craigslist (which differ quite a bit, actually)

but then discovered that someone's already done something like this at http://www.langpop.com/ (with appropriate disclaimers on accuracy). It still might be worth doing combining different sources, although at some point you aren't learning anything new.

  Scott Ruthfield [03.06.08 12:18 AM]

Two questions:

--Head First Design Patterns, the #1 book in the major list (and thus probably the best-selling book in the field) is different than most of the others, in that while it uses Java for its examples (and Java's one of the most common languages for using design patterns), it's not a learn-this-language book, as is almost every other book in this list. You might not be able to give us real numbers, but if you eliminated that book from the list, how much more steeply does Java fall?

--Visual Basic is your largest language in the mid-majors but has no book in the top six. Is that because of fragmentation in what makes up Visual Basic (i.e. both VB and VB.Net), or just a larger spread in beginner's books, or something else?

  Harvey Pengwyn [03.06.08 04:02 AM]

Market research... the only way to find out what programming languages are used would be to do some proper market research. Approach people, ask them if they program, and if so in what language.
Clearly no-one (apart from boondoggles paying other peoples' money) would pay for this because no one would want to know enough to pay.

As well as the obvious limitations of books as a metric (although I do not deny that it is interesting and indicative of SOMETHING), job adverts clearly favour jobs that advertise and skills that people hop jobs in.

We established last time that the thresholds for reporting on books are such that a language could actually get half way up the mid major list and not appear on the list if there were, say, about 10 equal books.

My gut feeling is that there are a lot more Fortran programmers than you would imagine. I am confident I could by walking a few hundred yards round up more Fortran programmers than there are Erlang programmers in Britain. However, I doubt many of them have bought more than one Fortran book in their lives, if that.

I suspect VBA is similarly under-represented.

  Graham [03.06.08 05:00 AM]

Really interesting post. Notwithstanding the various arguments and sources and what the data means it's an interesting analysis of what's out there.

Thanks for that!

However (you knew there had to be a downside), when you redo this in six months time you might want to think about how the tables of data are presented. To say what turned up in Google Reader was an unreadable mess would be an understatement. :)

  Joe [03.06.08 08:24 AM]

Actually, O'Reilly has 2 books on PowerShell: PowerShell Cookbook and Monad. Well, one with the word "PowerShell" in it and one with the developing name "Monad".

  davidm [03.06.08 01:20 PM]

Tim: by internal bias, I mean what languages are chosen to focus on for upcoming books. It seems to me that there is some bias towards trying to detect, and promote new "up and coming languages," perhaps even shaping the market somewhat. But it's kind of ironic, because a programming language will become popular based on underground movements, by the time books appear most new languages will be entering that awkward stage where we see if it can stand up to mainstream acceptance.

Otherwise, I wouldn't expect much correlation between book sales and programming languages used, and those that are likely to become successful. For example, PHP is undoubtedly very popular, but partially because you don't really need any books, there are so many decent resources on the Web. And Java is also a popular language, yet most people probably use IDEs such as Eclipse, and I doubt most people buy a book for an IDE (though they probably should).

Anyway, I have noticed several previous iterations of this column that seemed to be predicting Java's demise, for example, rather prematurely, and even now this carries on, with what evidence? Current trends in book sales? I'm guessing it might have more to do with software releases and trends. I don't think there's any "scientific" or particularly rational basis for predictions based on current sales, though it can be interesting.

  Michael R. Bernstein [03.06.08 03:02 PM]

Mike, it looks like the data for Alice in the next-to-last table is wrong (the callout of 'positive growth' tipped me off, and the fact that the table is sorted by 2007 units clinched it).

  cornelius [03.07.08 10:10 AM]

A million books have been written about Perl. There's nothing left to write about. So of course Perl book sales will lag behind. But the language is still widely used. People getting into the IT field should be careful not to be mislead by trends in book sales.

  Dr Zaius [03.07.08 10:17 AM]

Hmm. I did not count one million books, not even one thousand. This is just a sign of things to come. When Microsoft releases Powershell for Linux, the competion will be over and Perl ~= Latin.

  Tim O'Reilly [03.07.08 11:18 AM]

@Joe --

Apologies re Powershell. I'm obviously not as close to the book program as I used to be :-) I'm glad we have some contenders in what looks to be a nice, growing market.

But the point remains that we aren't slanting this data to tell a story that we want told. We're digging in the data to see if there's any story that we can all benefit from.

And yes, there are lots of factors, such as the maturity of a language. But, cornelius, I will note that there are lots of signs besides book sales that a lot of people have switched from perl to python, php, and ruby in recent years. I love perl, and it's still in wide use, but it missed the boat for many of the new applications that are driving adoption. Perl 6 may breathe some new life into the market, but I do think that there was definitely something of a missed opportunity.

  ac [03.07.08 12:21 PM]

You mentioned the decline of DirectX but entirely forgot to mention the nearly complete replacement of DirectX that's been drummed up for a year now and has several books about it? Of course as someone who actually makes these stats I'll let you figure out what I'm talking about :-)

  caesar [03.07.08 01:07 PM]

I think it's true that Perl is not as popular anymore for application development. But most IT workers who code to some degree are not application developers. They're system admins, system engineers, DBAs, etc. What's hot among application developers is only a part of the picture.

Stupid human bastard!

  Michael R. Bernstein [03.07.08 02:36 PM]

Ah, the corrected figure for alice (64->741 instead of 64->71) is far more dramatic. Thanks, whoever did the correction.

Dr. Zaius, why wait for Microsoft? Try Hotwire:
http://hotwire-shell.org/


Caesar, that last sentence was uncalled for.

  galen [03.11.08 04:49 AM]

I work for a big web hosting company and almost all the system engineers use Perl. It's flexible, powerful, and fun to code with. You can write code that can run on both Unix and Windows which is nice. Even the Windows guys here use Perl. Except for one guy. He doesn't know Perl so he has to use VBscript. We mock, ridicule, and persecute him for it! What a bum!

Perl is underrated. It's kinda like 'Planet of the Apes.' There were five ape films and a TV series. The Apes franchise dealt with evolution, racism, science vs. religon, ecology. Fascinating stuff.

  Jagadeesh Venugopal [07.30.08 06:14 AM]

Could it be possible that anyone who needs a Perl book already has one or knows someone who has one? Clearly Perl 5 has been around since the early '90s and therefore it is to be expected that book sales are maturing...

  MySchizoBuddy [02.17.09 06:12 AM]

Its time for the 2008 data
thanks

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS