Worldcat Identities

A lot of people think that there’s a single big identity play out there, and focus on technology solutions, but it seems to me that in true bottom-up internet style, we may eventually build our online identities out of a mashup of all the tracks we already leave in cyberspace. (Seth Goldstein has been exploring this idea with Attention Trust.)

One more small step in this direction was released by the OCLC (Online Computer Library Center) last week. It’s a prototype of an author identity system that shows holdings of books by any author in all of the libraries tracked by OCLC in its worldcat system. (Worldcat is pretty interesting all on its own — it lets you search for books in any library. It’s a “catalog of catalogs” just like the internet is a “network of networks.”)

Lorcan Dempsey wrote on an email list I’m part of:

Here is a protoptype that some folks may find interesting. [Worldcat Identities holds] pages for 20 Million ‘identities’ mined from Worldcat…. Now, this has been created programmatically from the data, so it does show off inconsistencies …

worldcat_identities.png

The opening screen shows a tag cloud of the authors and musicians whose work is most widely held by libraries, but you can search for any known author. (Who knew that as far as libraries are concerned, Harold Bloom is right up there with Brahms and Chopin. That’s one influential literary critic!)

Identities aren’t just those of authors and musicians, but also their subjects, the actors and directors of movies, and so on, in a web of identities. For example, looking at the “related names” entry for Shakespeare comes up with:

Henry IV King of England 1367-1413 [+]

Gollancz, Israel Sir 1864-1930 [+]

Rolfe, W. J. (William James) 1827-1910 [+]

Lamb, Charles 1775-1834 [+]

Hamlet (Legendary character) [+]
Macbeth King of Scotland 11th cent. [+]
Lamb, Mary 1764-1847 [+]
Henry V King of England 1387-1422 [+]
Richard III King of England 1452-1485 [+]
Caesar, Julius [+]

Expanding one of these identities — say “Hamlet” — produces the following related names:

Shakespeare, William 1564-1616 [+]
Saxo Grammaticus d. ca. 1204 [+]
Orestes (Greek mythology) [+]
Murray, Gilbert 1866-1957 [+]
Bloom, Harold. [+]
Thomas, Ambroise 1811-1896 [+]
Olivier, Laurence 1907- [+]
Gielgud, John 1904-2000 [+]
Ophelia (Fictitious character) [+]
Sydney, Basil 1894-1968 [+]

In short, this becomes an amazing tool for social network exploration of the literary and artistic world.

(I’d really love to see this tied in programmatically to wikipedia. There ought to be an automatic link to this site for every identity in wikipedia!)

While many of the “identities” are for “dead white guys” :-), there are also a lot of contemporary authors. As Lorcan said, “Some folks on the list will have a page.”

Sure enough, I do have a page. How odd that Windows 95 in a Nutshell is my most widely held work! And who knew that in addition to Programming Perl, Larry Wall appears to be the author of “Indexes to hymn translations automatically generated from the The HTML electronic text from the Christian Classics Ethereal Library, including all hymns from Hymns of the early church (1913) and Hymns of the Russian church (1920).”

P.S. Lorcan also blogged about WorldCat Identities here.

P.P.S. It may be a stretch for some of you to see the connection between Worldcat Identities and Attention Trust, but think about it, and extrapolate along the trend lines….

tags:
  • http://www.imran.ali.name Imran Ali

    Wow – kinda like Edgio for identity?

    I had an intern attempt to do something similar to WorldCat, but for video, a few years back. USing the Statistically Improbable Phrases and Capitalised Phrases (SIP & CAP) concepts from Amazon (http://www.imran.ali.name/blog/2005/07/inside_this_mov.html)

  • http://havahula.org frank hamilton

    Tim, thanks for the post but regarding your P.P.S — it’s not necessarily a strech to see the connection between WorldCat and Attention Trust. There are so many possible connections that I, as a reader, am simply not sure which one(s) you are seeing. If the goal was to simply put it out there as a rhetorical puzzle, great. But if you have a specific connection you’d like to elaborate on, I’d love to hear it.

    For me, what we produce (either as our Internet tracks, a volume or volumes of prose or a conversation) is a type of identity, true. But the idea of controlling identity — selling it, lending it and re-claiming it is only possible in regards to tangible data. What about all of the million of real-world relationships and encounters?

    It seems an awful lot to manage and process and begs the question — is our time well spent trying to control our environment or be content with allowing things to simply be?

  • Katharine Phenix

    Sorry. OCLC is not a “catalog of catalogs”. It is a “union catalog”, meaning that the catalogs of member libraries (OCLC is too expensive for many libraries to join)are merged into one big one. Just so you know.

  • http://tim.oreilly.com Tim O'Reilly

    Katharine — thanks for the clarification.

  • http://tim.oreilly.com Tim O'Reilly

    Frank –

    The OCLC announcement is interesting on a number of levels. The point of my framing it as I did was to highlight it as an identity play, not just an exploration of library catalogs.

    The point I’m also trying to make is that ultimately, I don’t think that the next generation of social network tools will be built by people explicitly building profiles (though that may be part of it) but by systems that instrument what we’re already doing.

    See for example my recent post on what the Web 2.0 address book ought to look like.

    I’m not connecting to the aspect of attentionTrust that focuses on user control, but on the aspect of the “attention recorder,” the idea that we create a unique signature by what we do online.

    Our phone, our credit card, our email communications, the books and articles and blog posts we write, records of the conferences we speak at, are already “attention recorders,” we just haven’t taken the time to instrument them and build tools for creating value out of these tracks of our behavior.

  • Thomas Lord

    The four principles promoted by Attention Trust are a nice fantasy. I would add, to such a list:

    Ponies
    You should have a pony.

    -t

  • http://tim.oreilly.com Tim O'Reilly

    Tom — the connection to AttentionTrust that I was drawing was not to their principles, but to the idea that identity may not be a single secure credential but rather a living record built up out of evidence from the net about who we are.

    And as to me, I have two ponies :-) Although generally Icelandics, though pony size, are generally referred to as horses.

  • Thomas Lord

    Tim, thanks. Let me elaborate, please.

    “Identity” is not a term with any simple meaning. You are talking about what we might dub institutional identity or, better, identity as subject. That is to say, you are talking about “identity” as a technical term that is used in the planning and deployment of social control from a subjectifying perspective. Within the logic of the discourse that (with your help) is built up around this concept of identity, users constitute not merely human beings (in the sense in which you recognize your friends and family) but, instead, objects which are set-members of a “population” (another technical term, meaning a resource subject to exploitation).

    You write “identity [may be] a living record built up out of evidence from the net about who we are.” That formulation invites conflation of our intuitive, personal, familial and friendship based concepts of identity (in a non-technical sense), with our “identity” as a unit of utility to larger, largely impersonal, system of powerful, dominating institutions.

    You did not write “Rich people are seeking the best way to tag the herd, seeking to extend their potential to manipulate people to a very high resolution form of control. For example, some are thinking of single-sign-on as the ultimate way to tag people; others (like AttentionTrust) are wondering if we can build a game around instititional tagging that the subjects will want to play, thereby making it more effective and sustainable.”

    Consequently, you negated the very possibility of criticism of the entire program that you are advocating. One must be what is conventionally regarded as rude (ahem) to even raise the topic.

    Most offensively, from my perspective, AttentionTrust’s principles are an example of propoganda that seeks to deny that any of this is going on — to seduce subjects into an illusion of individual control when, what is really being promoted, is a more perfected control.

    And for your part, you are helping the elite part of your audience console themselves with false senses of both the inevitability and banality of these developments.

    -t

  • Thomas Lord

    Isn’t it ironic that, during the same time in history when elite USians are waking up to the abhorant injustice and high social costs of using free markets competing over the “actuary table” as the primary means of allocating medical care and assigning costs, they are simultaneously, blindly applying essentially the same technology in every other area of commerce they can get their hands on?

    -t

  • http://wm.sieheauch.de/ Jakob

    There already , are 20.000 to 25.000 manually checked links between German Wikipedia articles and the German name authority file (PND). And there are links between German Wikipedia and English Wikipedia articles (langlinks.sql.gz). And the VIAF project should create links between PND and WorldCat. So why creating error-prone automatic methods when you can use the power of collaborated human intelligence? You just need to plug it together. With Wikipedia you can even correct names and dates if there might be errors in WorldCat :-)

  • http://www.live-conferencing.com video conferencing

    i cant get any error in worldcat may be if there is some problem i can contact with admin thanks to all to read my comments :)