Mon

Jun 19
2006

Tim O'Reilly

Tim O'Reilly

Spam Filtering Statistics from oreilly.com

I thought readers might enjoy this message that O'Reilly sys admin chief Bob Amen just wrote on our internal mailing list:"Below is a summary of the incoming email to our gateway mail servers for all domains that we accept email for (there are 57 domains). This summary is for the last 7 days:

Our mail servers accepted 1,438,909 connections, attempting to deliver 1,677,649 messages. We rejected 1,629,900 messages and accepted only 47,749 messages. That's a ratio of 1:34 accepted to rejected messages! Here is how the message rejections break down:
Bad HELO syntax: 393284
Sending mail server masquerades as our mail server: 126513
Rejected dictionary attacks: 22567
Rejected by SORBS black list: 262967
Rejected by SpamHaus black list: 342495
Rejected by local block list: 5717
Sender verify failed: 4525
Recipient verify failed (bad To: address): 287457
Attempted to relay: 5857
No subject: 176
Bad header syntax: 0
Spam rejected (score => 10): 42069
Viruses/malware rejected: 2575
Bad attachments rejected: 1594
The order that the rules are listed above is the order in which the rules are tested on each message.
I hope you find this interesting. Consider also that this is all done with open source software running on two Linux machines. The MTA is Exim with SpamAssassin used for spam analysis and ClamAV for virus analysis. I spend less than an hour a week maintaining these two systems. That's a pretty good ROI.

I'm sure you have your own similar (or worse) statistics. What a waste! (And thanks to all the developers and administrators who've make this problem much less intrusive to ordinary users, leaving it to people like Bob who have to shovel out the sh*t. Ordinary users still think spam is bad, but they don't really know just how bad....)


tags:   | comments: 18   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/4738

Comments: 18

  Claus [06.19.06 06:58 AM]

If you are rejecting messages based on SORBS you most likely have lots of false positives in your spam test and you're throwing out tons of ham with your spam.
The SORBS blacklisters are very, very eager to block, have obscure testing criteria and have a policy of demanding a ransom to delist IPs.
Among other things they often (if not always?) blocks all mail coming from GMail.

  Ned Baldessin [06.19.06 07:06 AM]

Blocking email that has no subject seems harsh. There are situations (quickly sending a link while on the phone for example) where a subject isn't necessary.

  adamsj [06.19.06 07:25 AM]

Something is very, very wrong with how this post displays. Tim, you might want to check how you coded your post.

  Marc Hedlund [06.19.06 07:58 AM]

adamsj,

Boy, were you right. Thanks for the comments; I've made some fixes.

  Bob Aman [06.19.06 08:16 AM]

I've had a lot of trouble with both SORBS and SpamCop blocking my perfectly legit mail as well.

  Justin Mason [06.19.06 09:28 AM]

ick; I'll double up on those "watch out for SORBS FPs" comments. IMO, it's not something that should be used for front-line binary accept/reject decisions; instead, leave that to a scoring-based fuzzy filter like our own SpamAssassin -- good to see you're using it. ;)

The Spamhaus lists are always reliable, however...

  Matt Riffle [06.19.06 10:15 AM]

I'll go ahead and pile on -- I've seen enough problems with SORBS that I couldn't recommend actually rejecting mail at SMTP time because of it. Using it as part of SpamAssassin's suite of such checks is about as far as I'd go.

-Matt

  Martin [06.19.06 10:23 AM]

A quick me too on SORBS. As far as I know several large German free mail providers are blocked in SORBS, for no reason they can fix.

  Bob Amen [06.19.06 02:49 PM]

So far in the three or so years we've been using SORBS I've only had one false positive. I know a lot of people don't like SORBS but I've had very good results with their black list. Every time I've investigated a listing, it's always been right on. Also, we don't use their more controversial lists. We only use the dul, zombie and nomail lists:

dul.dnsbl.sorbs.net - Dynamic IP Address ranges (NOT a Dial Up list!)
nomail.rhsbl.sorbs.net - List of domain names where the owners have indicated no email should ever originate from these domains.

zombie.dnsbl.sorbs.net - List of networks hijacked from their original owners, some of which have already been used for spamming.


There are a lot of other SORBS lists that may have questionable value, such as the spam DNSBL. I believe the commenters are referring to those lists and I agree with their assessment. No email from gmail is ever blocked by our servers...unless it has a high SpamAssassin score or contains a virus.

Cheers,
Bob

  Geoff Butterfield [06.19.06 04:04 PM]

High spam levels for sure, but how does this report compare to previous weeks? Our email server has been hit by a massive directory harvest attack ( it's lasted about 8 days, all from distributed IP addresses ) and other spam sources seem to be more active as well.

  Bob Aman [06.19.06 06:05 PM]

Note to self: Do not attempt to get a job at O'Reilly. The name confusion would be horrible.

  GRex [06.20.06 07:15 AM]

Really, are we witnessing the impending death of email? With emails getting less and less credible (having your legit mail reaching spam folder), will email still be relied on the way it is now 5 years later?

  Claus [06.20.06 08:11 AM]

Good point on the differences between the different SORBS lists. However, the 'ransom' policy and inability of free retests makes them less attractive policy wise if it actually works statistically.

  Bob Amen [06.20.06 11:12 AM]

Geoff:

I don't have detailed statistics for previous weeks but I do have a fairly long term view of the CPU usage and it hasn't jumped recently. Just a slow increase over the last year.


If you're having problems with a large dictionary attack you could try what we do. If the sending mail server attempts to send to three invalid addresses, we drop the connection. They may try again, but I haven't seen that happen. Usually they just go away.

  Geoff Butterfield [06.20.06 12:46 PM]

Bob:

Good tip regarding directory harvest attacks, I use a similar policy, which makes this current attack so interesting. It's coming from multiple IP addresses - a DDHA? ( Distributed Directory Harvest Attack ). Maybe its just us getting more spam these days...

  dave cormier [06.21.06 08:37 AM]

well... their spam filter has successfully blocked every email i tried to send to o'reilly during the first run of the web20 trademark issue. Seems like success to me!

  Brian [08.22.06 07:07 PM]

We too are getting slammed with DHA's. We have a good spam filter, except that it does not touch DHA's. I wonder if I could put Exim between my spam filter and email server and just use it to drop DHA's?

  Steve Hersker [10.31.06 08:18 AM]

We just cut over to Exim/SpamAssassin/ClamAV. The stats Bob posted are excellent - pardon the newbie question, but how did he gather that info? I've played with Eximstats, but am I missing something?
Thanks!

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS