Non Obvious Relationship Awareness

The most interesting person I met this year at PC Forum was Jeff Jonas, founder of System Research and Development (SRD), the data mining company that made its name in Las Vegas with a technology called NORA (Non-Obvious Relationship Awareness) — software that would alert casino security, for instance, that the dealer at table 11 once shared a phone number with the guy who is winning big at that same table.

As you can imagine, the government came knocking after 9/11. SRD got funding from In-Q-Tel, the CIA’s venture fund, and was acquired by IBM earlier this year. Jeff is now an IBM distinguished engineer and chief scientist of IBM’s Entity Analytics division.

His current focus is “anonymous entity resolution” — the ability to share sensitive data without actually revealing it. That is, by using one-way hashes, you can look across various databases for a match without actually pooling all the data and making it available to all. As you can imagine, solving this problem is fairly critical to the government if they want “total information awareness” while maintaining citizen privacy and some semblance of civil liberties.

I also find this idea fascinating with regard to social networking. As I’ve noted in my talks for the past couple of years, social networking as currently practiced by services like Friendster, Orkut, and LinkedIn is really a “hack.” (This is a good thing.) Much as screen scraping was a hack that showed the way to web services, current social networking apps point us towards a future in which we’ve truly reinvented the address book for the age of the internet. Why should we have to ask people if they will be our friends, and refer dates or jobs to us? Our true social networking applications — our email, our IM, and our phones — already know who our friends are. Microsoft Research’s Wallop project is a step in the right direction — a tool that lets us visualize and manage our communications web — but it only extends to first degree connections. What anonymous entity resolution would allow is an application that extends the Wallop idea to a full six degrees by comparing data across address books without actually sharing the addresses themselves unless the owner was willing.

Of course, this could be bad for highly connected people. I already know that Linda Stone is my shortest path to almost anybody, but once all my contacts know that as well, Linda might just have to go hide under a rock. Still, just as Napster unleashed a music revolution by choosing an unorthodox default (if you download, you make your computer available as a server as well), I believe that “opt out” rather than “opt in” is the trigger that will allow social networking to achieve its full potential as one of the core “Web 2.0” applications.

But back to NORA. A lot of what we do at O’Reilly is driven by pattern recognition, watching emerging trends, and deciding on the right point where adding a strong dose of information to the mix (books, conferences, advocacy) will help some important new idea reach a wider audience and hopefully reach its full potential. Mostly we do this pattern recognition by talking to cool people (“alpha geeks“) but we also do some data mining ourselves. But as Jeff points out, most current data mining efforts are rather like a game of Go Fish. (For example, in the intelligence context, “Do you have an Osama? No. Well, then, do you have a Saddam?”) Instead, he says, we need “fire and forget” queries, that return whenever they have data. (I also believe strongly in visualization tools like the ones we’re building in our own research group, tools that let you see aggregate patterns and trends.)

At any rate, Jeff’s definitely one of the movers and shakers of one of the areas that I believe is going to have a huge impact going forward. He’s also an O’Reilly kind of guy — a high school dropout, a self-taught hacker who developed software that a lot of PhDs told him couldn’t be done.

tags:
  • Scott Hamilton

    This was an interesting article.

    Data mining and NORA are all completely new to me. I may be stating the obvious, but it strikes me that one of the most difficult aspects of searching for relationships is determining what criteria to use. I assume that, given an amount of data that would make the IRS databases look like a flash drive, you would need a set of incredibly powerful computers to churn through the data just to get in the game.

    There’s sooo much data out there. You would first need to decide which Mt. Everest(s) of data to start looking at. Then you need to decide what information (data elements?) you’ll use for establishing potential relationships. Once you get beyond a certain, RELATIVELY obvious chunk of choices, wouldn’t the factors that a system uses to establish relationships be very much reflective of the personality of the system designer(s)? In other words, as an individual, you might see relationships where I don’t, and vice versa. Or are the sophisticated systems able to “learn” and start to decide what type of information might be used to look for relationships?