Tim O'Reilly

Top 1000 Books in Library Collections

Just heard a fascinating talk by Lorcan Dempsey of OCLC about some of the research projects going on there. One neat observation: the number of libraries holding a book represents something of an equivalent to Google PageRank, in "harnessing the collective intelligence of librarians" about what are the most important works of Western culture. (Obviously, the list is biased towards certain types of works typically held by libraries, such as books and classical music.)

One outcome is the list of the Top 1000 works in library holdings. The top ten: The Bible. the US Census. Mother Goose, the Divine Comedy, the Odyssey, the Iliad, Huck Finn, Lord of the Rings, Hamlet, Alice in Wonderland. A lovely browse for book lovers.

Using a different metric -- the number of different editions* -- the top ten fiction works are Don Quixote, Robinson Crusoe, Alice in Wonderland, Treasure Island, Huck Finn, Tom Sawyer, A Christmas Carol, Oliver Twist, Uncle Tom's Cabin, David Copperfield. (More precisely, the metric is not "editions" but the number of different "expressions", which is a step higher than an edition in the OCLC's logical hierarchy. You have a work, realized through an expression (e.g. an illustrated edition, a Spanish edition, an abridged edition, a spoken word edition), embodied in a manifestation (the 1954 Penguin edition), and an item (an actual copy of that manifestation.))

Lorcan also says that they are planning to offer a web service to look up the library holding rank for any title. Nice!

Comments: 10

Scott Berkun   [03.16.06 02:57 PM]

Cool list - but raises more questions than answers for me.

I'd love to see this ranking paired with 3 other rankings: 1) books checked out most often by patrons, 2) books actually read by patrons (hard to get) and 3) The top grossing books sold in bookstores. Simply having a single copy in stock in a library only says so much. Does this list jive at all with 1,2 or 3?

This list reads more like "the list of books people claim they've read" rather than the books people actually read. I mean, Beowulf? Dante? Gullivers Travels? Crusoe? Bhagavadgita? These are tough, high investment reads. And seeing the Illiad, Twain and Dickens up there reflects high school english curriculums more than anything else.

Mother Goose (#3) Alice in Wonderland (#10) and Garfield (#15), fine reads they are, smell of parent influenced choices, as family support is a goal for most libraries.

As with pagerank, it's hard to assign importance to this list - a special kind of popularity maybe, but that's not the same thing. There's not a single history book (not even the decleration of indepedence) in the top 100. Nor a dictionary (#331 is first one I found).

The Far side, by Gary Larson comes in at #115, one notch above the first history book, Gibbon's Decline and fall of the Roman empire (!), and The first Harry Potter book comes in at #220. Can you frame these three facts in any meaningful way? I couldn't.

I'd love to ask some librarians what they make of the top 100 in this list. If I found this library in someone's office, I wouldn't know what to think (Classics major? Victim of time travel? English proffesor with kids?).

Mike O'Regan   [03.16.06 03:40 PM]

OCLC has a large proportion of university and other higher ed/academic libraries. The tens of thousands of elementary, middle, and high school libraries in North America that they don't have would produce a different list. Think Harry Potter, World Book Encyclopedia, etc.

Anjan Bacchu   [03.16.06 05:06 PM]

hi tim,

thanks for sharing this info.

i note that bible, koran and the BhagavadGita are all top books.

I also note that not many technical books are in the top 100. Hopefully a peter norton in the future will write one OR a HEAD FIRST book will make it into the top 100.

BTW, I'm looking forward to the HEAD FIRST CALCULUS. When is it due ?



Tim O'Reilly   [03.17.06 01:03 PM]

Scott --

I agree that the list is weighted to classics, and isn't a real test of popularity among readers. But like many other data points, it doesn't have to conform to our expectations, it just is. These ARE the books most often held by libraries. You can say that the librarians are out of touch, or you can also say that this is what one critical subculture (i.e. librarians) has decided is the Western Canon.

The same is true of Google search results. Because of history, they skew technical. Ditto for tags. But out of the welter of data, patterns emerge.

I do have to say that I prefer the other metric -- the number of different expressions. The number of times a book is put out in different forms is a really good indicator of its enduring value and popular interest, and reflects a broader market gage than just library holdings. Unfortunately, I don't see that they show the top 1000 by that metric.

Bageshree Shevade   [03.17.06 03:39 PM]

this is very interesting information...thanks for sharing...:) there could be various interesting and rich statistics that could be extracted from this, like -

(1) the frequency with which different books are checked out - are the books most frequently checked out more popular? are books that are on hold more in demand?.

(2) frequency of checked out books v/s age group - do retired people have more time and hence check out more books from the library often than the working class?

(3) the different expressions (book editions) v/s the age group of people checking out those editions - what are the kinds of people (user profile - retired, CEO's, students) reading abridged editions v/s illustrated editions

(4) the age of the book v/s the age of people and their culture - are teenagers today really interested in reading classics?

(5) the categories of books read by people v/s their cultural background - do people in East read more religious books than people in the West?


Tim O'Reilly   [03.17.06 04:57 PM]

Bageshree -- These would indeed be great stats to have, but I don't think they are available in this data set, which is drawn from library catalogs, not library usage records. Would be lovely to have the actual usage data as well -- and perhaps that's aggregated somewhere, but I doubt it.

Kevin Farnham   [03.17.06 10:25 PM]

I shared this wonderful list with the members of a literary mailing list I belong to, the Sidney-Spenser list (as in Sir Philip Sidney, author of Arcadia, and Edmund Spenser, author of The Faerie Queene). I included the quote about Google PageRank.

I think there will be some commentary about the Top 1000 list itself, but it will also be interesting to see if anyone comments about the Google PageRank equivalence. Maybe someone will even come to Radar and post a comment (I gave them the link)!

Lorcan Dempsey   [03.21.06 12:54 PM]

Some good comments here.

Yes - it would be good to be able to do ranking, recommending and relating based on other 'intentional' data - data about choices and behaviors. Our list is based on choices libraries have made about their collections. Circulation and InterLibraryLoan data would give you another view. Data captured from OpenURL resolvers or other transaction points another.

There is growing interest in making such data more accessible and usable (standardize, commoditize, aggregate, syndicate) but we are in early days.

Lord   [04.10.06 12:28 AM]

I think that the most important works of Western culture are awared prizes or discussed by critics. What people search on the internet is often quite far from being a masterpiece.

J.O. From Urban MVP   [10.11.07 10:04 AM]

lol This information would be most definitely valuable if you are in the Book Sales industry and have libraries are your major client. Well specially if your brand new to that industry. But pretty cool read, thanks for the find!

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

