Open Text Mining Interface

Timo Hannay of Nature has published a bit more information on their Open Text Mining Interface. I first saw this a few weeks ago when I was on a panel that Timo chaired at the Life Sciences Expo in Boston . It immediately struck me as “slap your forehead brilliantly obvious.” A lot of people want full text access to journal articles (or books, for that matter). For example, a search engine wanting to understand what gene sequences are described in an article needs the full text, not just the abstract. But authors (and publishers) aren’t keen on exposing the full text to all readers. OTMI solves this problem. It’s an XML format that expresses the full text — word vectors, plus “snippets” — an alphabetically ordered sentence list — for programs to do full-text search against. This way, a program can data-mine the full text of the article, but a human can’t “read” it sequentially.

Now, it may be that like all forms of DRM, this will encounter user resistance from folks who believe in open access to everything. But I love the cleverness of this approach, which lets machines make use of the content in ways that human readers can’t. I like it. You might consider it a “copyright hack.”

Timo’s group at Nature is one to watch. I’m also a fan of their Connotea project, which aims to bring tagging to scientific citations.