The Spock Entity Resolution Challenge

I wrote in my previous entry about spock that entity resolution is one of the key aspects of people search. Spock already does a pretty good job of this, but they want to get even better. As a result, they’ve offered the $50,000 Spock Entity Resolution Challenge:

From the challenge site:

To improve our technology and to create a better user experience, we decided to share the fun! We have selected one of our most interesting problems, namely Entity Resolution, to share with the community, allowing other leading computer scientists and engineers to compete in an open contest. The winners of this global competition will reap a handsome reward, and perhaps even employment at Spock.

You can work individually and in teams. The competition will last 4 months and the winning team will win a Grand Prize of $50,000! Most importantly you’ll be working on a very important and widely applicable problem. We will also be issuing prizes for 2nd and 3rd place.

A common problem that we face is that there are many people with the same name. Given that, how do we distinguish a document about Michael Jackson the singer from Michael Jackson the football player?

With billions of documents and people on the web, we need to identify and cluster web documents accurately to the people they are related to. Mapping these named entities from documents to the correct person is the essence of the Spock Challenge.

In order to constrain the problem so that it can be successfully solved by an individual or a small team, we provide you with real world data with ground truth. This data contains 100,000 documents about people, and the challenge is to determine all the distinct people described in the data set. This data can be your training set. Once you’ve got your basic algorithm working against the training set, we let you further tune your code by running it against a second test data set.

We give you instant accuracy feedback in the form of a percentage rank score. The score depends on how many correct unique people you can identify in the data. This way you can continue to refine your work and see how well you are holding up against your competitors.

(Jeff Jonas: you don’t need the money, but I’ll bet that Jaideep Singh, Jay Bhatti, and the other folks at Spock would love to get to know you! For those of you who don’t know what I’m talking about, here’s a collection of links to stories about Jeff Jonas’ work on entity resolution.)

tags: