Previous  |  Next



Tim O'Reilly

Tim O'Reilly

Google Image Labeler, the ESP Game, and Human-Computer Symbiosis

In comments on my previous entry, Stephen Rondeau pointed out that Google Image Labeler appears to be based on the ESP Game developed by Professor Luis von Ahn of CMU. An anonymous commenter pointed to the video of Luis von Ahn's tech talk at Google on July 26, 2006.

The tech talk was fascinating, both because there was no hint in it that Google was about to announce something based on von Ahn's work -- the talk is all about his previously published games -- and for the actual thought-provoking content, which gives a lot of background on the design of this kind of game, and in general, the idea of harnessing humans to work as program components via games.

Luis started out his talk by looking at Captchas. Most of this was familiar material, although he has a great definition of a Captcha: "A program that can generate and grade tests that most humans can pass, but that current computer programs cannot pass." (This is an interesting variation on the Turing test, in which humans generate and grade tests that most humans can pass, but current computer programs cannot pass. Is there another variation in the future, in which computers generate and grade tests that computers can pass, but humans cannot pass?)

At about 7 minutes into the talk, he makes a staggering assertion about the amount of time spent on casual games: in 2003, 9 billion hours were spent playing solitaire. By comparison, it took only 7 million human hours (6.8 hours of solitaire) to build the Empire State Building, and only 20 million human hours (less than a day of solitaire) to build the Panama Canal. He then states his thesis, "human computation":

"We're going to consider the human brain as an extremely advanced processor that can solve problems that computers cannot yet solve. Even more, we're going to consider all humanity as an extremely advanced and large scale distributed processing unit that can solve large scale problems that computers cannot yet solve. I claim that the relationship between humans and computers is extremely parasitic.... What I want to advocate for in this talk is a symbiotic relationship, a symbiosis in which humans solve some problems, computers solve others, and together we work to create a better world."

At that point, the audience dissolves in laughter, as the image he's projecting on the screen is the symbolic waterfall of the Matrix.

Getting serious again, he talks about how games can be used to harness some of this unused processing power, and goes specifically into how they can be used for labeling images. First, he describes the ESP Game. Two players are paired up, and given the same image to label. When they agree on an label, they are given a new image. Their goal is to type as many possible labels as quickly as possible till the two players find a match.

Evidence shows that people find this activity to be fun. He says that people have played as long as 15 hours at a stretch, that many people play over 20 hours a week, and that over three years 75,000 players have agreed on over 15 million labels for images. But so far, what he's done is fairly small-scale. (When I logged in to the ESP Game this morning, there were 179 simultaneous players.) With 5000 simultaneous players -- a number found on many popular online gaming sites, he claims it would take about 2 months to label all images on Google images.

He talks also about some of the things he does to make the game fun, a lot of it eerily reminiscent of Kathy Sierra's insights on Creating Passionate Users. And I have to say, comparing the ESP Game against Google Image Labeler, I wonder if Google hasn't missed some important points in von Ahn's work. While his implementation is MUCH slower, it's more fun, with sound effects, scoring feedback, and other game features making it much more engaging. He even identifies the partner by login name. (He talks about how people love to play because they feel a really good connection to their partner when they agree on a word. He shows some amazing quotes describing how passionate players are about the game.)

He also talks about how to prevent cheating: They give people test images that have already been labeled, and only store their guesses when it's clear that they are playing fair. And once you have enough images labeled, you can get additional labels by prohibiting labels that have already been used. And of course, you can even implement one player versions of the game, and zero-player versions (in which you get additional matches by re-running the sessions from different player combinations that used the same image.)

But the ESP Game is only one instance of a general class of game that Luis calls "games with a purpose", i.e. games that "run a computation in people's brains rather than in silicon processors." Some of the other games he talked about include:

  • peekaboom - a game for locating objects in images. One player is given an image and a word (the pairing is output from the ESP game); the player clicks on the image, and highlights a small bit to be shown to the other player, who has to guess the word. In the first 4 months, 27 players identified objects in 2.1 million images. The top player has 3.3 million points. (From what I can tell, players get about 100 points per correct answer, plus various bonuses.) Luis points out that once this game generates a large enough data set, it could perhaps be used for training computer vision programs.
  • verbosity (pdf). This game collects common-sense facts ("Milk is white", "Milk goes with cereal.") A player is given a word, and fills in a phrase to describe it. The other player is given the word, and has to guess the same descriptive phrase. Luis points out that there have been many AI research projects attempting to build common sense databases like this, but that they haven't figure out to make them fun. (He doesn't mention Cyc by name, but that's clearly what he has in mind.)

Luis describes peekaboom and verbosity as "asymmetric verification games": Input is given to player 1, whose output is sent to player 2, who has to guess the input given only player 1's output. Once he guesses, this verifies the connection between the output and the input. This is in contrast to the ESP Game, which is a symmetric verification game. But symmetric games work only when there is a constrained list of possible outputs.

All in all, a fascinating talk on a trend that I believe is going to be one of the most important any of us will face. As the symbiosis between humans and computers becomes deeper, and at a larger scale, we're going to see problems that were formerly construed as "hard AI" suddenly broken, not because computers themselves have become intelligent, but because humans and computers have gotten better at working together. We're only at the early stages of harnessing collective intelligence, and we're going to see more and more breakthroughs as creative computer scientists find new areas that they can tackle with bionic software.

tags:   | comments: 24   | Sphere It


0 TrackBacks

TrackBack URL for this entry:

Comments: 24

Anonymous   [09.02.06 11:16 AM]

So Tim, tell us how you're going to trick
us all into writing O'Reilly technical
books (maybe the Make model?); if two or
three of us write substantially the same
thing, it must be publishable, no?

adam   [09.02.06 11:45 AM]

interesting... a group of museums have actually been doing this for a while:

Thomas Lord   [09.02.06 05:58 PM]

Here's one for your scenario planning thingie, Tim. Within one year, google will be sending out checks to the top 1,000 or so players.


Stephen Rondeau   [09.02.06 10:16 PM]

This is the result of Google listening to and/or licensing the idea behind Luis von Ahn's work at CMU, as can be seen at

He has generalized his clever research on getting people to freely label and locate items within an images, as well as contribute to building a machine representation of common sense knowledge and other projects, as "Human Computation" -- see:

SEO и деньги   [09.03.06 04:29 AM]

I'm affraid that "bionic software" can be easily exploited. Do you remember "Google bombing" for example?

Tim O'Reilly   [09.03.06 07:36 AM]

Stephen -- thanks for the Luis von Ahn links -- and to anonymous, just above, who pointed to Luis' fascinating tech talk on the subject at Google (that's the video link above.)

Luis also talks about some of the other games he's developed. It was interesting enough that I'm going to follow up with a separate post.

SEO -- if you actually look at the structure of the game, and Luis' talk about it, you'll see that it's more difficult than you think, especially in a pairwise game like this. It isn't the wisdom of crowds, it's a setup in which two people have to guess the same term before the image is considered correctly labeled. And those people are randomly selected, so unless there are a huge number of colluding players, it will be tough to game, especially since Luis at least also runs a 1-player version, in which a new player is "tested" against already-identified images. Someone gaming the system could thus be detected and the results thrown out.

Ted   [09.03.06 10:14 AM]

Well, it's useful to recall that McLuhan
cast the computer as brain-amplifier, back
in the day: the computer is to the brain
as the wheel is to the foot, i.e., all
technologies are extensions of human
beings. This claim presumably holds of
human society, group interactions,
collective, symbolic interaction (which
can be construed as computation).
So I'm not so taken with the novelty
of "bionic software". I guess what is
novel is what folks might consider to
be fun (Tom Sawyer's fencepainting

adamsj   [09.03.06 01:53 PM]

Is there another variation in the future, in which computers generate and grade tests that computers can pass, but humans cannot pass?

Wouldn't you say lots of brute force computation fits that bill?

Tom   [09.03.06 02:11 PM]

Looks a lot like the ESP game Carnegie Mellon University came up with!

I attended a lecture put on my the guy behind the ESP game. Had some very creative insights into how to leverage the processing of the human mind by creating reward systems that turned it into a game.

Tim O'Reilly   [09.03.06 03:34 PM]

Ted -- I think what is unique about what I'm calling "bionic software" is the extent to which these applications make people work for the computer, rather than the other way around. With previous adaptations, the wheel, to take your example, the extensions empower the individual.

I suppose that constructs like the corporation, which take on a life of their own, could certainly be good analogues.

But I don't think that the examples you give are parallel, or at the least, they leave out a lot. The computer as an extension of human capability (as a faster calculation engine, for example) is exactly parallel, but writing a computer program that depends for its operation on getting a bunch of humans to do a difficult operation for it seems very different to me.

John Adams -- re the reverse turing test -- while it's true that brute force computation is something that computers can do better than we can (people like Ramanujan aside), it is not the case (yet) that computers are designing tests to find out whether or not we are computers, which was my tongue-in-cheek speculation.

Ian Danforth   [09.03.06 09:36 PM]

Danny Sullivan at Search Engine Watch has this:

Postscript: I heard back from Luis von Ahn, who sent me this:

Yes, Image Labeler is based on my ESP Game, which Google licensed. I'm not employed by Google, however, since I'm a full-time faculty member at Carnegie Mellon.

Also the assertion that "there was no hint in [the tech talk] that Google was about to announce something based on von Ahn's work" is not true.

1. They invited him. Someone thought his stuff was cool enough to bring him in.

2. The question session is edited. NDA type questions are removed from the tech-talks, or segmented into an entirely seperate section. Obviously some google-tech related questions were asked.

Alternately it could be true that they had done no work prior to the talk on GIL. I don't think it would be that hard to set up. July 26th to Sept 1 would have been sufficient time even in a couple people's 20%.


You Mon Tsang   [09.03.06 11:36 PM]

Your specific point that hard AI problems will be solved "because humans and computers have gotten better at working together" is extremely important.

In developing our own systems at Boxxet over the last few months, we were experimenting in techniques to discover great content. In one case, we compared results using humans in Amazon's Mechanical Turk vs our own computer algorithms (which start with some human input). For this one particular problem, we decided to use the computer algorithms (it turned out to be cheaper with similar quality). But it was interesting that we could have easily chosen computer-assisted human answers or human-assisted computer answers.

Having such an option available to us is only possible when "humans and computers have gotten better at working together."

Ramesh Jain   [09.04.06 07:08 AM]

There is a strong impedance mismatch resulting in serious difficulties in use of current information systems. As stated in a Communications of ACM article ( ):

"Most conventional information environments actually work against human-machine synergy. The human mind is very efficient at conceptual and perceptual
analysis and relatively weak at mathematical
and logical analysis; computers are the opposite. The
process of designing information environments
involves logical and mathematical principles;
humans eventually interact with these systems using
a logical approach. Even the interfaces are designed
to expect users to formulate anything but the most
obvious searches using logical combinations, thus
discouraging many users."

Clearly today's computer vision systems are at the level below a toddler and hence use of human help in solving image search problem, sounds a good one. I am not sure that this is such a good approach, however. Tags are very subjective and ESP game does not remove that. Remember that keywords (or tags) assigned to articles -- even by authors and editors -- were relatively ineffective in comarison to the exhaustive text processing approaches that made current search engines what they are.

I am not saying that tags as proposed in Flickr or now in Google are not useful -0 they clearly are. But don't expect that ladder -- even a real tall one -- will take us to moon.

Tim O'Reilly   [09.04.06 08:50 AM]

Ramesh -- I totally agree with your comment about the impedance mismatch. In fact, I published an entire book, Steve Talbott's wonderful The Future Does Not Compute, which has as its premise that computers will become intelligent only to the extent that we define our own intelligence to be more like theirs, and remove from consideration all the things that make up intelligence that computers can never do.

But wonderful and thought-provoking as Steve's book is, I do think that there are qualitative differences between what's happening with computers networks at scale today, and the way that humans are involved in the process, and former efforts.

Part of it has to do with scale. Tagging by a couple of experts isn't that great. But tagging by millions of people proves surprisingly useful.

I do have some concerns about the image labeler/ESP Game, though. It's easy to get players to label a photo "man," but much harder to get random players to label "Ramesh Jain." Even in one of the examples that Luis gave in his talk, they showed a picture of Walter Matthau from the period of Grumpy Old Men, and the photo was labled "Saddam" -- and I have to confess that was my first thought on seeing the image as well. So identifying individual people is harder. It will be interesting to see how far Google pushes that.

I would imagine a different approach would be necessary for proper name identification of people and things, since you'll need to find the right players who will actually know those things, or run the image by enough people that the right answer emerges. I would guess that they couldn't use the pairwise sets, but they could use the 1-player version, or the zero-player version, rerunning individual answers against each other to find repeated matches.

Scott Carpenter   [09.04.06 10:40 AM]

This image labeling game is very interesting and thanks, Tim, for your first post pointing it out.

I think having the point incentive and time limit causes obsessives -- not *me*, certainly -- to want to label things with the lowest common denominator. Man. Table. Sky. Water. But maybe this is ok -- once a few million people have agreed that a picture is a "man," they can put that label in the off-limits list and look for something else. (Although I was presented with one picture of four drinking glasses and prohibited from using glasses and was stumped.)

On one run I was paired with someone where we seemed to both be pursuing the LCD strategy but I was amused afterward to find on a picture of a man and another of a couple that he tried "dork" and "dorks" before going with man and people.

On the computer intelligence subject, I'm more with Kurzweil in thinking we'll have truly intelligent machines. (I think that's how he views it.) Stephen Hawking also said (or wrote) something that made a big impression on me about this topic, that: "It seems to me that if very complicated chemical molecules can operate in humans to make them intelligent, then equally complicated electronic circuits can also make computers act in an intelligent way." Intelligence is a big subject, of course, and much of it depends on definitions that will change over time, as you said. But I think we will have machines every bit as intelligent -- and much more so -- than you or me, as disturbing as that might sound.

Kevin Kelly   [09.05.06 10:58 AM]


There is another game that is almost "with a purpose" related to this topic. I feel is very important. It's based on the classic 20 Questions, a binary yes-no game to guess what you are thinking. The online version of this game > has been played 10 million times, and each time someone plays it teaches the 20Q something about the facts of our world (like CYC). What's cool if most people believe a book has a spine, than that's what 20Q believes. It is amazingly smart, and what is cool is that is all the programming is done simply by random people playing the game.

Gordon Mohr   [09.05.06 10:57 PM]

I've wondered about Riya: is it really algorithms inside -- or people? And if not people, why not? Software could help cue up the info -- for example, clipping images to likely-face regions -- but then the right UI would let human labellers group and map faces at a very rapid rate. And they wouldn't even need to know the language/culture of the photo subjects.

It was seeing a computer implement the 'Animal' yes-no guessing game, and learn new animals when it guessed wrong, that first piqued my interest in programming almost 30 years ago. Seeing advanced game-trained systems -- like 20Q or Google ImageLabeller -- still impresses me today.

William Kelley   [09.06.06 12:02 AM]

I have implemented several common sense learning games that play through "normal" conversation, via Instant Messenger or web-based bots. The first was a word association game (milk, white, clouds, sky, blue, etc) which had pretty amazing results over the several months that it ran, played by several thousand players. As it started with an empty database, it had the ability to learn any language or any topic. It also had the unexpected property of completely eliminating user typos. I also developed (and am reworking) a common sense learning bot, much like Open Mind, though again, it differs from that and from games like Verbosity in that it plays one on one with a user in a straight conversational mode. No fill in the blanks, no partners, though of course it uses predetermined templates to gather the common sense facts.

Thomas Lord   [09.06.06 05:53 PM]

We ought to worry about the conjunction of (a) a return to the days of exchanging labor for goods and services by barter; (b) a very, very, very, (very) low price for labor; (c) the direction of of said cheap labor to the creation and improvement of privately held databases of non-trivial utility in population manipulation; (d) the difficulty of even experts to grasp the signficance of these databases, nevermind the average laborer who contributes to them. George Dyson just keeps sounding more and more right.


adamsj   [09.19.06 08:18 AM]

And among the winners is...

l   [11.04.06 07:48 AM]


Hera   [03.11.07 05:28 PM]

Maybe I should telling this to Google but I would point to the fact that some of these games require a 'Basic Human Keenship' to play, that is the labeling of an image or so would require similar mentality which differs across the globe. At the same time most gaming servers I know like Blizzard's have several servers [around per continent or so] for reasons i could only guess.This means that the games might be played only by certain 'mental groups' any given time -- I'd explain more but i am out of space - think about it..

Barrett Close   [06.22.07 01:51 AM]

Oscar-winning British actor Jim Broadbent joins the cast for the fourth Indiana Jones film...

Post A Comment:

 (please be patient, comments may take awhile to post)

Remember Me?

Subscribe to this Site

Radar RSS feed