Previous  |  Next


Jan 7

Tim O'Reilly

Tim O'Reilly

MarkMail Provides Amazing Search Capabilities

I've been meaning to write for a while about MarkLogic's awesome new search tool for trolling through open source mailing lists, MarkMail.

Let's face it. While there may be a new generation that thinks that email is for old fogies, for many of us, email is a primary online tool, at least as important to us as the web. Many of us no longer file documents or attachments -- we just search for them again in our email. Perhaps most importantly, email is a primary collaboration tool--and as many of us have figured out, collaboration is one of the internet's killer apps. Searching our shared memory in a collaborative space is REALLY useful -- with open source mailing lists being a great example.

Despite its importance, very little has been done to improve on email. The clients we use today are not radically different from what we used ten years ago (except perhaps in being web-based). This is why there was so much excitement when xobni showed how useful it is to expose the social network hidden in email.

MarkMail does something equally powerful. Imagine a tool that lets you see trends across thousands of email messages, saved over years. Imagine being able to find who is the most prolific poster on a given topic, and explore the histogram of their entire message history. Imagine being able to do instantaneous data mining against millions of stored messages, with a response time better than you get looking at your local mailbox.

MarkMail provides all this and more. MarkLogic has stored approximately 5.5 million email messages across over 700 plus open source mailing lists -- all of the Apache, MySQL, Mozilla, and PHP lists, plus a smattering of others, with more to be added over time (hopefully soon) -- and provided an interface that beats Googling. It's as fast or faster, but more importantly, you have built-in data mining capabilities that, I trust, will eventually make their way into more traditional email systems.

Let me show you a sample search. I might be looking for actual message content -- the answer to a question -- but I might be interested in the big picture. As a publisher, my editors are often looking for trend data to tell us whether interest in a topic is increasing or declining. So, for example, let's say we were thinking of publishing a book on lucene. (This is for example only -- there's already a good book from Manning, Lucene in Action.) But let's take a look at what MarkMail shows us:

MarkMail search for Lucene

I can immediately see that there's a lot of growth in mailing list traffic for Lucene. Sounds promising. And I can see who are the most prolific posters. Possible authors? Well, Erik Hatcher, the top poster, is the author of that Manning book I already mentioned. But a few drill-down clicks show whether other top posters are still involved or not. (For an example where someone dropped out, search on Struts and then view the drill-down histogram for Craig McClanahan.) And of course, I can drill into the messages themselves to see who expresses ideas concisely and powerfully. (Yes, we do troll mailing lists for authors and conference speakers!)

And in a feature that old command-line junkies will love, once you want to drill into actual messages, just type "n" to pop into message viewing mode, with "n" and "p" moving you forward and back through the message stream. It's a really slick mail reading interface. As Jason Hunter from MarkLogic put it, their UI model was:

1. Search with a minimal constraint
2. Refine interactively until you've narrowed things sufficiently
3. Hit "n" to peruse the results

OK, so maybe most of you wouldn't use this tool for trend analysis. But just imagine if you could use a tool like this for searching your own mail? I love the way MarkMail gives me a bunch of drill-down choices in the UI, and as I choose them, rewrites the command-line in the search box. I'd love to see features like this in my other mail packages. With on Mac OS X, for example, it's impossible to do a complex search. You can search for a text string in the from field, the subject line, or the entire message, but what if you want to say "I want a message on x, from Joe, to Mary, sent between April and June of 2006." Even on Gmail, where I can do this kind of search with Search Options, I have to go to another whole screen, out of the search flow, to do it. You can construct that kind of a search in MarkMail just by navigating around. Yumm. How long before regular mail vendors start doing this kind of thing? This is a really sweet search interface.

Where MarkMail really shines is in managing large mail archives. And that's why, of course, MarkLogic has put up MarkMail for free. They know that there are potential corporate clients who have huge mail archives that they want to mine. And the performance of their existing systems (not to mention their interfaces) just won't cut it.

tags: email, MarkLogic  | comments: 3   | Sphere It


0 TrackBacks

TrackBack URL for this entry:

Silicon Valley   [01.07.08 09:40 AM]

This is interesting and helpful for seeking trends from the early days of the Web.

Look at these results for 'Google'
- starting from early 1999.

One flaw is that it appears to only go back to early 1995.

But used in conjunction with Google groups' search tool - which goes back to 1981 - they can both help researchers sort through valuable archives

Jason Hunter   [01.07.08 01:06 PM]

As someone working on MarkMail, I thought I'd mention that the reason the archives go back "only to 1995" is that, among the mailing lists presently loaded, all were started on or after 1995. As we expand communities and load more lists, we'll likely load some lists that were initiated earlier.

If people have suggestions for new lists, we welcome them. And if people want to stay up to date with the site and the new lists and features, we have a blog at

Jason   [01.09.08 03:16 PM]

This is an awesome site. It would be cool if they could build a similar site for websearch!

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

Subscribe to this Site

Radar RSS feed