Researchers at Princeton University have been working to develop an algorithm to mine data by its importance, rather than by such indicators as links. The Princeton search algorithm parses language, identifying important words or phrases across a set of data, such as a group of documents or articles. It then measures the influence of those words across the data set to identify the most relevant information. It also looks at the way the language changes over time, which might make it applicable to real-time search applications.
Measuring information flow to determine influence has a lot of potential, says Jure Leskovec, an assistant professor of computer science in the machine learning department at Stanford University. The most obvious application, he says, is personalization; software could look at what sort of articles a person is reading and point her to articles or websites that contain relevant material.
Such applications could be the answer to what journalist Mathew Ingram calls the “Holy Grail For News“: recommendations. In a recent post, Ingram argued that none of the aggregation or news apps are achieving any impressive level of success in this area thus far. He suggested that social media, such as Facebook and Twitter, might provide the solution:
… if you want recommendations about what to read, your Twitter stream and Facebook graph are probably the best solution — and anyone who wants to do better is going to have to leverage both of them to do it.
Princeton researchers might be able to add to that conversation, as well as to the future of search in general. Stephan Spencer, inventor and founder of Netconcepts, pointed to such possible search improvements in a 2009 post on the future of search:
Somehow I don’t think Google will, for too much longer, be basing its importance, authority and trust algorithms on the link graph. I think they will develop an artificial intelligence “expert system” that can use its own judgment in determining whether a web page or website is spammy.