The Traditional Future

A prominent U.S. sociologist and student of professions, Andrew Abbott of the University of Chicago, has written a thought-provoking thesis on what he terms “library research” — that is, research as performed with library-held resources by historians, et. al, via the reading and browsing of texts — compared to social science research, which has a more linear, "Idea->Question->Data->Method->Result" type of methodology.

The pre-print, “The Traditional Future: A Computational Theory of Library Research,” is full of insights about library centric research, including intriguing parallels between library research and neural net computing architectures; a comparison that made me think anew, and with more clarity, about how the science of history is conducted. Armed with a distinctive interpretation of library research, Abbott is able to draw some incisive conclusions about the ramifications of large repositories of digitized texts (such as Google Book Search) on the conduct of scholarship.

Library research, Abbott notes, “is not interested in creating a model of reality based on fixed meanings and then querying reality whether this model is right or wrong. … Rather, it seeks to contribute to an evolving conversation about what human activity means. Its premise is that the real world has no inherent or single meaning, but becomes what we make it.”

This has immediate ramifications for the potential utility of search premised on concordance-based indexes for humanistic research. “[I]t is by no means clear that increasing the efficiency of library research will improve its overall quality. For example, it is not clear that increasing the speed of access to library materials by orders of magnitude has improved the quality of library-based research.”

There are other, inherently structural characteristics of how automated discovery is provisioned that bear on the optimization of library research. One of these impacts relates to the presence of noise, or randomness, that inevitably arises when there are multiple paths to discovery. With more and more information accessible through a dwindling paucity of search interfaces, the variation in returned results is reduced. Research is not served well when one receives the same answers to the same questions; no learning lies there.

As anyone who has worked in optimization recently knows, stripping the randomness out of a computing system is a bad idea. Harnessing randomness is what optimization is all about today. (Even algorithms designed for convergence make extensive use of randomness, and it is clear that library research in particular thrives on it.) But it is evident that much of the technologization of libraries is destroying huge swaths of randomness. First, the reduction of access to a relatively small number of search engines, with fairly simple-minded indexing systems — most typically concordance indexing (not keywords, which are assigned by humans) — has meant a vast decrease in the randomness of retrieval. Everybody who asks the same questions of the same sources gets the same answers. The centralization and simplification of access tools thus has major and dangerous consequences. This comes even through reduction of temporal randomness. In major indexes without cumulations – the Readers Guide, for example – substantial randomness was introduced by the fact that researchers in different periods tended to see different references. With complete cumulations, that variation is gone.

That’s an interesting observation – almanacs or compilations often present slices, or ever-varying accumulations of results, and so even identical questions would inevitably return different results depending upon when in the publication sequence they were asked. As more and more information is aggregated into composite sets, this temporal variation is also lost.

Dr. Abbott makes a final point about the transformation of browsing and discovery, and the underlying nature of library based research – often, the investigator doesn’t quite know exactly what they are looking for, just as much if not more than merely not knowing the best sources to look in.

This argument makes it clear why “efficient” search is actually dangerous. The more technology allows us to find exactly what we want the more we lose this browsing power. But library research, as any real adept knows, consists in the first instance in knowing, when you run across something suddenly interesting, that you ought to have wanted to look for it in the first place. Library research is almost never a matter of looking for known items. But looking for known items is the true – indeed the only – glory of the technological library. The technological library thus helps us do something faster but it is something we almost never want to do and, furthermore, it strips us in the process of much of the randomness-in-order on which browsing naturally feeds. In this sense, the technologized library is a disaster.

Google Book Search is a wonderful thing. But it not so wonderful that we should assume it will transform education and research. Nor should we assume that in the future we might not be able to generate architectures that make books live more intelligently amongst each other – and more freely – than anything that Google might envision. As libraries who might be participating in digitization: let us challenge the fundamental assumptions we are handed – that must seem so dangerously obvious – and rethink the landscape of our profession, and how we might best support our real work of learning.