Thu

Mar 22
2007

Tim O'Reilly

Tim O'Reilly

How Google Books is Changing Academic History

Peter Brantley writes in email: "a Berkeley grad student disses the experience of the Berkeley library system and lauds Google." Jo Guldi, the author of that blog entry, wrote:

"I was idly trying a search on "roads" to see what sort of a literature would turn up for the period of my dissertation research, 1740-1850. I didn't expect much. I've spent the last two years wandering through the Yale, Harvard, and California libraries, the British Library, Britain's National Archives, and the immense reserves of North American Inter Library Loan reading every book on London, pavement, or travel I could get my hands on.

Surprise. In a single idle search I just added twenty extra full-text books to my list....

To give just one example, this little puppy -- Henry Parnell's A Treatise on Roads (1833) -- one of the key texts for my dissertation, exists on our campus in Berkeley's transport library, a quaint but understaffed, spare room hidden on the third floor of the engineering building, far, far away from where historians ever go. It wasn't actually on the shelf when I got there, so it took some patient emailing with the transport library librarians before the book was found, returned to the correct place, held at the desk for me, to be picked up during the library hours specific to that particular institution (10am-4pm, M-Fr). Wild with enthusiasm at having at last obtained it, I held the volume prisoner at my desk in San Francisco for six straight months, unruffled by overdue notices, until at last the plaintive emails from the circulation desk were too much for me to bear. Research in my world is very often a personal matter of haggling for more time with the particular librarian in question. They're used to us, and I figure they need a good struggle to keep them alert. But thanks to Google Book Search, these days of scavenger-hunt and tug-of-war are drawing to an end.
...

What this signals, by the way, is the opportunity for a new age of scholarship. Cultural and image analysis used to be painfully time-consuming, heavy lifting, involving rare kinds of access, full fellowships, immense travel, and long waits for delicate books. Comparison between different cultural sources was even harder, placing absurd demands on the cultural historian's personal memory and note-taking skills. Cultural historians, despite their many skills, stood second in depth of research on any particular topic to political historians, for whom one visit to a Parliamentary archive and one visit to a personal residence outfitted them with every last detail of historical change. Now all that is changing. Comparing a hundred images is no longer a problem for a year's labor in an out-of-the-way museum reading room. Comparing a hundred personal accounts from working men is no longer a task to eat up a social historian's entire year."

It's important to remember, though, that finding and reading out of print books is just the beginning of the benefits of digitization. (That's why it's important for at least the out-of-copyright books to be available in more open formats.) Last year, Gregory Crane asked "What Can You Do With a Million Books?," and pointed out that things get most interesting when you can compute against this corpus of books. Computing doesn't just mean measuring or counting (though those things may also be useful). It may mean reshaping in creative, unexpected ways.

At O'Reilly, we've done things like create automated content statistics, extracted just the examples so they could be used for code search -- both by us, and by other code search engines. We're all just taking baby steps, though.

The clearest example I've yet seen of the possibilities of using digital technology to breathe new life into old material remains David Rumsey's work with maps. Once he'd digitized his collection of 30,000 old maps, he was able to do things like georectify them, mapping them to a consistent size and coordinate space so that maps from different eras could be overlaid on each other, creating timelines showing the evolution of cities and landscapes. This is an awesome demonstration of why access to otherwise unavailable materials (the creative commons Lessig talks about) leads to the creation of new value.

Bringing this thought round full circle, academic historians have long been immersed in this kind of creative re-use, but as Jo Guldi wrote in the blog post that I quoted from above, their work is being turbocharged by online access and book search.


tags: copyright  | comments: 8   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5363

Comments: 8

  Rene Gourley [03.22.07 12:22 PM]

The great danger for historians, especially as sources become more and more digitized, is that we will slip into the absurd assumption that sources that have not been digitized do not exist. Digitally available or indexed archives make historical research at a particular location much more efficient, but deep research will always require pawing through boxes.

The side-benefit of the digitization of archival material is that histories should get deeper and more interesting. Because it takes less time to find the basics, we have more time for interviews and digging in attics.

The future of history is bright.

  Oracep [03.22.07 02:57 PM]

'Comparing a hundred personal accounts from working men is no longer a task to eat up a social historian's entire year.' But is there a tool to make comparing easier? At Oracep.com we use Coning to compare the thinking between documents.


Coning is based on Bloom's Cognition Taxonomy, rating each paragraph and then the document. So the paragraph from which I extracted the opening quote rates at 84% on our thinking scale, meaning that the writer has some hefty analysis and judgement here.


Yet, the whole post rates at 71%. Thus Coning allows you to instantly view the higher order thinking.


84%] What this signals, by the way, is the opportunity for a new age of scholarship. Cultural and image analysis used to be painfully time-consuming, heavy lifting, involving rare kinds of access, full fellowships, immense travel, and long waits for delicate books. Comparison between different cultural sources was even harder, placing absurd demands on the cultural historian's personal memory and note-taking skills. Cultural historians, despite their many skills, stood second in depth of research on any particular topic to political historians, for whom one visit to a Parliamentary archive and one visit to a personal residence outfitted them with every last detail of historical change. Now all that is changing. Comparing a hundred images is no longer a problem for a year's labor in an out-of-the-way museum reading room. Comparing a hundred personal accounts from working men is no longer a task to eat up a social historian's entire year."

  Leo Klein [03.22.07 07:27 PM]

As a librarian, lemme tell you, you're not 'dissing' a library when you're using online material.

What the librarian would want to do is steer you to the most available legitimate copy. If that means on Google, so be it.

P.S. Berkeley is a "Partner" of Google Books.

  Tim O'Reilly [03.22.07 07:38 PM]

Leo -- just to be clear, that was not me saying that the grad student was "dissing" the library, it was Peter Brantley, the executive director of the digital library federation (formerly doing something or other with special libaries at Berkeley.)

But your point is well taken. The job of the librarian is changing from managing physical books to managing access to information. But as Peter wrote on his blog back in 2005, Search Engines are also libraries...albeit of a very different kind.

  ravi [03.23.07 12:01 AM]

Google Books has been undoubtedly useful but, they don't seem to be digitising classic books in law, medicine and various sciences, which are very famous and now in public domain.

How ever publishers who have transcribed and published them, have put in Google Books where only few pages/lines can be accessed.

One of the books, a great textbook too, costs $79 approx. few hundred Rupees more than fees for a semester of LL.B. in my hometown in India.

Books are horribly expensive here, especially classic legal tomes, which which have a smaller market than computer books, to make it economically viable for companies in India to bring out authorised cheaper reprints.

:(

  polymath [03.23.07 12:24 AM]

@Rene: "The great danger for historians, especially as sources become more and more digitized, is that we will slip into the absurd assumption that sources that have not been digitized do not exist."

How about: "The great danger for historians, especially as sources become more and more published, is that we will slip into the absurd assumption that sources that have not been published do not exist."

Careers are made by, as you say, "pawing through boxes" to support new theories. Some will not make the effort, but that is no "danger," merely a fact of life.

Just as textual scholars today write editions, which systematize unpublished works (which would otherwise be invisible to most scholars) and make them accessible to a broader scholarly audience, textual scholars tomorrow will digitize nondigital sources (which would otherwise be invisible to most scholars) and make them accessible to a broader scholarly audience.

  Megan [09.12.07 06:15 AM]

"The great danger for historians, especially as sources become more and more digitized, is that we will slip into the absurd assumption that sources that have not been digitized do not exist."

I don't think this is the case at all. Obviously it will depend on the individual historian, but I think as more and more sources are digitized we will be less and less likely to take any sort of 'case closed' approach to history, always assuming that there is something else out there that has been missed by the scanner or is on the back of a photograph. Perhaps we will see historians focusing more on intangible history.

  Donat Agosti [04.02.08 01:09 AM]

"computing against the body of books" is really the point to make our legacy publications open access. In the domain of biodiversity heritage literature, describing all the world's estimated 1.8M species in well over million published pages (according to the Biodviversity Heritage Library project), the publications' building block are descriptions. If they can be marked up, anything within its confines relates to the particular species, and thus relation between species can be automatically generated, or distribution maps plotted.

Plazi is a project offering tools to mark up such publications and make the individual descriptions accessible, including plotting distribution records (see eg the Argentine ant. For each publication, an average of 35 description could be made accessible and thus offering a huge source for displaying and computing against.

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS