Sep 29
2006

Nikolaj Nyholm

### Life 3.0 - The Hutter Prize

The Hutter Prize, established earlier this year, is fashioned after the Methuselah Mouse Prize mentioned yesterday -- but where the M-Prize is set up in the quest for eternal life, the Hutter Prize is set up in search of artificial intelligence.

On first looks, all the Hutter Prize asks contestants is to come up with best possible means of compressing the text of Wikipedia. However, in a highly readable motivation for the prize, Matt Mahoney sketches the connections between the prize, artificial intelligence and, surprisingly, the old heuristic Occam's Razor: the simplest answer is usually the correct answer.
The prize's founder, Marcus Hutter has given a mathematical proof that Occam's Razor is in fact possible: The best algorithm for a decision problem is the shortest. The idea with the prize is to capture as much meaning as possible from Wikipedia by finding the best/shortest algorithm to write Wikipedia: A very dense compression of the text. In his motivation, Mahoney goes on to sketch how this optimal algorithm can be used to ultimately beat the Turing test. Like the M-Prize, this is truly a competition that feeds the imagination (while saving you from ripping up carcasses or scavenging graveyards in quest of eternal life).

Results are beginning to appear: In August Alexander Ratushnyak submitted compressions that were almost 7% shorter than the previous best attempt.
Applications to artificial intelligence, however, remain science fiction for now.

tags:   | comments: 2   | Sphere It
submit:

[10.02.06 08:32 AM]

Just a little correction: Marcus' second name is
Hutter, i.e. "u" instead of the german
umlaut "ü". Nevertheless,
nice to see something like that on oreilly radar!

[10.03.06 12:14 PM]

Here is a simple solution to the Wikipedia storage problem.

First, consider that an xor operation on a 2state Boolean set of strings will give you a maximum compression of the image of the difference between the strings.

The compression issue in Wikipedia is an expansion of this basic problem. Instead of a 2 state boolean, you have, say, an 128* state (dimension) "space" which requires a similar operation between successive vectors...an 128 dimensional xor operation. That general concept defines the limit of compression that any actual scheme is going to be capable of.

*This number is arbitrary, representing an old ascii code extent. Unicode compressions would be appropriately larger in the address space required, although the actual data matrices wouldn't be much denser.

Type the characters you see in the picture above.