Mon

Jan 28
2008

Brady Forrest

Brady Forrest

Comments Back On; reCAPTCHA Wasn't At Fault

On Friday, we experienced a huge spike in comment spam on Radar. We turned off auto-publish for comments (they have since been turned back on). We incorrectly assumed that reCAPTCHA, one of the tools that we use to block, spam had been exploited. We were wrong (Sorry!).

After my post we were contacted by the reCAPTCHA team and they help us debug the issue. From their server logs they determined that it was definitely a human-driven attack (based on all the mistakes in the server logs) and that all of the traffic was coming from Turkish Telekom. The reCAPTCHA team was kind enough to send the following summary mail of the attack and about their service.

There are a few key points about the people spamming you:
Based on log information, it's very clear that this was based on humans solving the CAPTCHA -- the types of errors they make are common human-being mistakes (such as accidentally hitting nearby keys on the keyboard).
We at reCAPTCHA realize that some spammers may want to resort to "CAPTCHA outsourcing," where they get humans to solve the CAPTCHAs. In general, it is relatively difficult to organize this outsourcing, and it can only be done in small scales. Also, we're forcing spammers to put half of this outsourcing cost into digitizing books :)
Your "attack" was launched using the TurkTelekom network. This network is known to harbor spammers. See for example:

http://www.uceprotect.net/en/rblcheck.php (enter AS9121)
http://www.spamhaus.org/sbl/sbl.lasso?query=SBL59440
http://www.spamhaus.org/sbl/listings.lasso?isp=ttnet.net.tr
http://www.securityzone.org/?p=26
http://www.joewein.net/fraud/host-abdallah-internet.htm
Some stats about reCAPTCHA in general:
reCAPTCHA generates the equivalent of over 2,000 people working 8 hours per day, 5 days per week on digitizing books.
reCAPTCHA is currently used by over 20,000 websites
Because reCAPTCHA is a web service, we are able to quickly adapt to trends in abuse.

If you want to use reCAPTCHA's free service they have a page to help you get started. Previously I mainly focused on the fact that we were helping older books get digitized. After this incident I am glad that we are using them and recommend them whole-heartedly.

tags: web 2.0  | comments: 6   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/6238

Comments: 6

  Sachin [01.28.08 03:25 AM]

I recently attended a presentation "fighting online crime" by a big anti virus company in my university, The presenter was their Chief research officer, he said in one of the problems these companies face these days is that spammers over time have matured and to by pass the spam filter they use images instead of text for the advertisement&spam message.

To which he further said that spam filters started running OCR to detect the spam keywords.

The response by spammers was to give a 3D view of the advertisement in form of a 2D image which is then sent as an email.This remains still as a problem for spam filter companies as they are not detected even by their OCR engines.

I believe things like reCaptcha can have one more interesting use here. If the anti/viros companies start giving their images as reCaptcha, people will solve and we will save ourselves from being spammed!!!

  Andre [01.28.08 04:27 AM]

I guess it has to happen. CAPTCHAs will not kill comment spam, but reduce it considerably. Also, with reCAPTCHA, the work to solve the CAPTCHA is put to good use. In fact I think the idea is so great that I just wrote this comment to solve the reCAPTCHA at the end.

  Search‚óä Engines Web [01.28.08 05:12 AM]

There is a way to get some degree of justice against those attempting to abuse the blog.

If in fact they were promoting their Websites, you can get the URLs and report them to their domain registrar, and to their host (if it is reputable).

Sometimes free sites are used; they will have their accounts terminated.

Perhaps Captcha should offer this as an optional service with the request of the user.

It ultimately would have long term benefits - as the abusers are now LOSING something as a result of their efforts

  Eeebook [01.28.08 08:43 AM]

Why do you say "difficult to organize CAPTCHA outsourcing"? Just make somebody believe it's your CAPTCHA (and not an outsourced CAPTCHA).

  bowerbird [01.28.08 10:08 AM]

> reCAPTCHA generates the equivalent of
> over 2,000 people working 8 hours per day,
> 5 days per week on digitizing books.

first, luis von ahn deserves his genius grant;
his work has been imaginative and brilliant...

but i'd sure love an explanation how recaptcha
helps with "digitizing books". the info that's
returned is so slight that -- even if combined
over tens of millions of people -- i don't see
how it can be productively used to clean o.c.r.

and i've done a lot of research in that arena...

i'm not saying recaptcha _cannot_ work, just that
i'd like to see some reasoning and some evidence.

-bowerbird

p.s. further, the o.c.r. coming from the o.c.a.
-- after any not-that-extraordinary clean-up --
is _phenomenally_accurate_ already...

  bex [01.28.08 11:25 AM]

@bowerbird

What kind of info do you want? The reCAPTCHA project is based on using humans to read words that (some) OCR technology cannot.

The point it to outsource the cleanup to also fight SPAM.

Although, I STILL think reCAPTCHA should have some kind of embedded SETI@HOME widget... that way, determined human spammers will spend a lot of money on electricity to send a SPAM message...

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.