Thanks for visiting! Remember that nowadays, (most) blocklists don't really govern deliverability and inbox placement. Want to learn more about email marketing best practices, email technology, and deliverability troubleshooting? Then you'll want to check out my other site, Spam Resource.
Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

APEWS News and Commentary Roundup

APEWS, the Anonymous Postmasters Early Warning System, is an “anonymous” blocking list that claims to run in the style of SPEWS. That is to say, its goal is to be an “early warning system,” catching and stopping spam before other lists or filters have the opportunity to do so.

The APEWS blocking list was first announced by way of an anonymous posting to the newsgroup news.admin.net-abuse.blocklisting on January 12, 2007. Though this newsgroup post originated from the IP address 149.9.0.57 (registered to US provider PSI/Cogent), the list is widely believed to be run from Germany.

If you are listed on APEWS and wondering what to do, visit this page for my suggestions.

Accuracy

A quick review of the past thirteen weeks of my own stats.dnsbl.com data shows that the list has been ramping up in aggressiveness the entire time that I've been tracking it. What was barely a 20% effectiveness rate against spam eleven weeks ago is up to 80+ percent on a week-by-week basis. However, false positives have risen similarly.

The rising spam match rate is based on what I would characterize as the “stopped clock is right twice a day” principle. List enough IP addresses, and eventually you're going to stop some spam. The side effect is that you're going to block legitimate mail (and lots of it) at the same time. Against my personal hamtrap data, APEWS blocks two out of ten of every legitimate piece of newsletter or list mail that I've signed up for.

I'm not kidding about "listing enough IP addresses," either. As of today (August 11, 2007), APEWS lists just about 1.8 billion IP addresses - by the raw numbers alone, this is 42% of the entire IP4 networking space. Much of the IP space listed isn't even routable; suggesting little attention is being paid to what IP addresses are actually able to transmit traffic (email or otherwise). Also, APEWS has been growing at a very fast rate. From July 20th through today, they have added an additional 7.5 million IP addresses. These are data points that, in my opinion, suggest that the list is bloated, questionably targeted, and inaccurate.

09/30/2007 update: Click here to read about how I can similarly block around 60% of spam just by arbitrarily listing 42% of the internet.

Based on this data, and the recommendations of other trusted blocklist operators and anti-abuse folks, I personally would not use APEWS to filter incoming mail.

Controversy and Commentary

The blocklist is considered controversial by many other blocklist operators, ISP abuse staff, and anti-spam advocates.

  • Matthew Sullivan, SORBS maintainer, indicates that as of August 9, 2007, SORBS will no longer be publishing the APEWS blocklist zones via DNS.

  • Claus V. Wolfhausen, maintainer of UCEPROTECT, another German-run blocklist, indicates that UCEPROTECT will no longer publish the APEWS blocklist zones. (Previously: Claus warned that unless APEWS were to make immediate, significant changes to its policies, UCEPROTECT will no longer publish the APEWS blocklist zones.)

  • Suresh Ramasubramanian, respected anti-abuse manager for large mailbox provider Outblaze, categorizes APEWS as “meant to be used by fools.”

  • Steve Linford, Spamhaus maintainer, has suggested numerous times on newsgroups and elsewhere that APEWS is poorly run and is not widely used.

  • Kevin Liston and others from the Internet Storm Center have indicated that APEWS is using the ISC "top source" data to support blocklist entries, in violation of the data's license, and against the wishes of those who provide this data. ISC says that the data "is not supposed to be used as a blocklist as it is bound to include false positives" and that "APEWS may be a useful 'anti-spam" list if you do not mind losing a lot of valid e-mail as well."

Misplaced Newsgroup Discussion

If you read either of the two popular anti-spam newsgroups (news.admin.net-abuse.blocklisting and news.admin.net-abuse.email), you already know that both groups are often overrun with requests (example) from people who find that they are listed by APEWS. I find over 2,000 messages on these groups relating to APEWS remove requests, which is a high number considering that the blocklist is less than a year old. The blocklist group is run “anonymously.” Question 41 of the APEWS FAQ asks how one contacts APEWS. The answer includes the following: One does not. APEWS does not accept removal request by email, fax, voicemail or letters.” [...] “General blocklist related issues can be discussed in the public forums mentioned above. The newsgroups news.admin.net-abuse.blocklisting (NANABL) and news.admin.net-abuse.email (NANAE) are good choices.

This is likely why many administrators post to these newsgroups, asking for assistance, when finding their IP addresses are listed. The FAQ does warn that “abusing these newsgroups & lists by posting removal request you will make a fool of yourself,” but that doesn't seem to be a deterrent. I would theorize that this is because a lot of the people on the wrong side of listings do not understand why they are listed and do not now how to “fix” whatever issue led to the listing, as the listings are often broad and vague.

ISP Perspective

Vincent Schönau, an ISP abuse adminstrator, has related his APEWS experiences to me in email, and given me permission to share them here.

Other blacklists have employed the 'escalations' strategy in the past, but APEWS has taken it to a whole new level; a few spams from a providers ip ranges will cause all or most of the providers ip space to be listed in APEWS, with comments such as 'unprofessional / negligent provider'. What this means is that if your provider is a noticeable source of e-mail, sooner or later, it's going to get listed. Several providers of 'blacklist checks','blacklist comparisons', 'e-mail reputation checks' and include APEWS data. Apparently this is causing systems administrators who are desperate to reduce the amount of spam they're receiving to think that using it might work - perhaps because not all of those sources include the data on false positives for the blacklists. In practice, this means that several times a week, I'm spending time explaining to my users how they should work around the e-mail delivery-problems they're seeing which may or may not be related to APEWS. I could be spending this time taking action against compromised hosts in our network instead. This hurts providers who do take action against the abuse from their network more than providers who didn't care in the first place.

Others have related similar stories to me, of how long after spammers were booted, that a listing still persists. In one instance, a provider had a compromised machine, which was identified and disconnected within two hours of sending spam. Three days later APEWS listed it, and six weeks later, the listing persists, even though the issue is long since addressed.

If you are listed on APEWS and wondering what to do, visit this page for my suggestions.

SORBS: Accuracy Rates and False Positives

The blocking list SORBS (aka the “Spam and Open Relay Blocking System”) was created in 2002 by Australian Matthew Sullivan. SORBS publishes a main “aggregate zone” (dnsbl.sorbs.net) containing listings meeting a multitude of criteria beyond open relaying mail services. SORBS also publishes multiple other zones meeting various criteria.

As related previously, SORBS appears to be undergoing changes. Some of these changes appear to relate to the fact that the SORBS maintainer has repeatedly taken issue with the methodology used by DNSBL.com to measure accuracy rates and false positive rates.

SORBS has indicated that they have the ability to feed false or different data in response to queries from DNSBL.com. As such, it's unclear if recent query results are indicative of results seen by other users. Because of concerns that SORBS may be attempting to sway the data reported, it's important to share current data and information, so that system administrators can make an educated determination as to whether or not it would be wise to use this DNSBL.

Historical Information

I've been tracking data on the main SORBS zone, dnsbl.sorbs.net, since March, 2007. Here's what I've found.

  • For most of the past fifteen weeks, the DNSBL had an effectiveness rate varying between fifty percent and fifty six percent, week over week. This means that SORBS correctly blocked a piece of spam in my spamtrap about five to six times out of ten.

  • For many weeks, I believe SORBS clearly suffered from significant false positive issues. As measured by my own calculations (see here and here for more info), the false positive rate is in the 7.9% - 11.1% range. This means that if your users sign up for the same kind of mail that I did, that for every one hundred pieces of solicited mail your users signed up for and expected to receive, SORBS is likely to block seven to twelve of them.

Recent Data Changes

  • On July 9, 2007, changes were made to SORBS. As you can see from the chart above, around this time (near the start of week 12), the net result is that the effectiveness rate and false positive rates have both significantly declined.

  • Since July 9, 2007, I have not noted any additional false positive from the main SORBS zone. Because of indication from SORBS that they are able to feed false data, it is unclear if the results I am seeing are accurate.

  • Similarly, the effectiveness rate of the main SORBS zone seems to have greatly declined as well. Since July 9, 2007, it is hovering in the 18% range.

There are two possible conclusions to make here:

  • SORBS is somehow able to feed different blocklist data to DNSBL.com than to others. If so, then the historical data I have summarized above is likely to be the most accurate view of SORBS. Or,

  • SORBS has gutted its lists and the poor effectiveness rates I'm now seeing are reflective of how it would likely work for others.

It's hard to say which scenario is the more accurate one, and what future testing will reveal. I'll certainly continue to collect data, but right now, there's an open question of SORBS' effectiveness and false positives.

As of Thursday, July 19, 2007, SORBS changed the default zone mentioned in configuration guidance pages from dnsbl.sorbs.net to a domain not owned by SORBS. As a result, if any SORBS user copies and pastes a configuration snippet from one of the SORBS configuration pages verbatim, the result is that 100% of a site's inbound email will be blocked. My recommendation is to proceed with caution – if you are not sure what you're doing with DNSBL use and mail server configuration, a misstep here will have significant consequences.

SORBS has leveled the following criticism, assumably as justification for for the results published on DNSBL.com. Below is an overview and response to the points raised:

  • SORBS claims that the DNSBL.com email feed data is US-centric. This is true. The domains involved in these hamtraps and spamtraps are "dot com " domains, and have always been hosted in the US. If this means that SORBS is inaccurate as a result, it suggests that SORBS is Australia-centric, and likely will not work as well for those in other countries.

  • SORBS claims that a false positive as defined on DNSBL.com is not what everyone calls a false positive. This is true. I consider a false positive to be a requested message that was blocked. Others have different definitions. I believe the definition used on DNSBL.com to be accurate. I further believe that the most common definition of a false positive as used by regular end users or system administrators is most likely to align with my own.

  • SORBS is unable to verify false positive hits, as DNSBL.com does not provide IP addresses correlating to false positive hits. This is true. If data were provided to any blocklist operator regarding false positives, this would enable the DNSBL to whitewash over the issues by removing the IP addresses reported (and no others). This is similar to why blocklist groups do not provide spamtrap information – they do not want their spamtraps “compromised,” which would allow a bad sender to simply stop sending to spamtraps, but continue spamming elsewhere. Therefore, this information is not provided to any blocklist. (Other list operators have been more understanding.)

  • SORBS claims that the zone “dnsbl.sorbs.net” being queried by DNSBL.com is not the zone used by most users or recommended by SORBS as the main or default zone. This is untrue. It has or had clearly been positioned as the default zone or default recommended configuration choice, and remains the zone first listed, positioned as the “aggregate zone” as of July 20, 2007.

  • SORBS claims that Spamhaus volunteers have (or had) access to the SORBS database and have entered listings in the past to drive significant false positive issues. I am not associated with either SORBS or Spamhaus so I can't speak to this accusation.

  • SORBS claims that the methodology of checking mail against DNSBLs within 15 minutes of receipt is inaccurate. This is untrue. Anyone who uses a DNSBL is enabling their mail server or spam filter to check the mail against the DNSBL within seconds to minutes of receipt. If, as SORBS states, their DNSBL distribution model is such that it suffers from this methodology, then it suggests that it may be slow to respond to real spam trends. (10/29/2007 update: At a recent conference, over a beer with a colleague who builds tools to block spam for a living, I was gently chided over this bit of methodology. I was told that I was letting mail get far too old. 15 minutes is a hundred years as far as spam vector measurement is concerned; the vendor in question uses a 60 second interval at maximum. By this logic, I was being too forgiving as far as slowly updating anti-spam blocklists were concerned. This is further at odds with the criticism from SORBS.)
  • SORBS has picked a specific sender as the source for the SORBS false positive rates I report, saying that this sender is a "habitual source of spam." I have no financial interest or any other connection to the sender in question, except that I ordered pillows from them in December, 2006, and was happy with the product and service they provided. As a result, I signed up to receive mail from them, and happily do so. If I used SORBS to reject mail, that mail would not reach me. Additionally, this sender is far from the only source of false positives I found when utilizing the SORBS blocklist. (11/09/2007 update: The specific sender is/was Overstock.com. SORBS categorizes Overstock as a spammer. Matthew Sullivan (now known as Michelle Sullivan), in fact, indicated that "1000's of people who receive unsolicited commercial/bulk email from them." There are two additional problems with his characterizations here. First, Overstock.com is not listed on ANY OTHER of the approximately 47 blocklists I check, except FIVETEN (which lists many hundreds of potentially legitimate senders, and therefore, is not very useful as a second opinion here.) It's not on any of the lists that commonly do list supposedly-legitimate senders who may have run afoul of spamtraps. Second, the last mail I had received from Overstock.com was on May 25, 2007. This is significantly before the July 9th cutoff of my data, and measured false positives were on the rise even with no further mail from Overstock.com in the data set. Incidentally, I have no idea why I've received no mail since. I didn't unsubscribe.)

Additionally, SORBS has made numerous statements questioning the accuracy of data published here, and characterizing this project as something other than honest and transparent. Here's how it works: I have a feed of mail, and I check all mail received for DNSBL hits. I give internet users a live, rolling snapshot of how various lists intersect with my mail steams. That's all there is to it. I leave it to you, the reader, to decide if I've been honest and clear at every step of this process, and as always, I welcome your feedback.

(11/18/2007 Update: Added the phrase "that if your users sign up for the same kind of mail that I did" above to clarify false positive comments.)

Spamcop BL: Take Another Look (It’s Accurate!)

If you know me, you know that in the past, I’ve made no secret of my disdain for the Spamcop DNSBL, aka the SCBL. I’ve worked in spam prevention, deliverability, and the email realm for a long time, in various capacities. I’ve created and run at least two blocking lists that you know about. Later, I helped to design and create a system that processed thousands of confirmed opt-in/double opt-in newsletter signups a day. Combine those two details together and that’s what led me to take issue with Spamcop. My employer’s COI/DOI signup servers kept getting listed by Spamcop, based on some really bad math to measure email volume thresholds and make a determination as to what to list.

I was trying to do the right thing. I was implementing what Spamcop (and other anti-spam groups) want: confirmed opt-in/double opt-in. Yet Spamcop was listing the servers and subsequent mailings regardless. It made me really frustrated, and I was very disappointed. See, it’s not really fighting spam. It’s just blocking mail you don’t like, or don’t care about. While perfectly allowed, I am of the opinion that it’s lame to do so under the banner of “fighting the good fight” to stop spam. I’ve shared my thoughts on this topic in just about every available forum—websites, blogs, discussion lists. I know I’ve personally guided many sysadmins away from using the SCBL in the past, because it was easily, demonstrably, listing things that were obviously not spam.

In February 2007, I found that Microsoft was using the SCBL to filter/reject inbound corporate email. (Note that I said corporate email—mail sent to users at micrsoft.com, not users of MSN or Hotmail. I don’t know whether or not SCBL data is used in MSN Hotmail delivery determination, but from what I’ve observed, it doesn’t seem to be.) This started me off on another rant on how ill-advised I felt this was, based on my prior experiences with Spamcop. Some kindly folks (and some less kindly) suggested that I needed to revisit my opinion of the Spamcop blocking list, because things have changed.

After a lot of measuring and discussion, I’m here to tell you: Spamcop’s DNSBL has changed, and for the better. It works very well nowadays, as personally measured by me. The open question on Spamcop was what drove me to dive into my massive DNSBL tracking project. I started that back on March 10th. Ever since then I’ve been compiling data on Spamcop blocking list matches against both spam and non-spam. Here’s what I see:

DNSBL

Spam hits

Acc %

Ham hits

Failure Rate

Spamcop SCBL

156194

49.37%

0

0.00%

Spamhaus ZEN

255521

80.77%

5

0.10%

Spamcop+ZEN

267795

84.65%

5

0.10%

Range:

~ 74 days

Total Spam

316348

Total Ham

4999

As you can see, Spamcop helps you attack nearly 50% of spam received, while affecting no legitimate senders. Very few lists do better. Spamhaus ZEN (which combines multiple lists) does better, but will occasionally have a false positive, based on some reputational issue perceived with a given sender.

My recommendation: Spamcop’s blocking list is safe to use, and will effectively help you reduce the amount of spam you have to deal with. Where I find it particularly useful is as an addition to Spamhaus ZEN: If you block mail from entities on either list, you get a 3.8% percent boost in effectiveness. Meaning, just under four percent of my spam hits are found on the Spamcop list, but not on Spamhaus.

For historical reasons only, here are links to my previous articles on Spamcop:

Spamcop Roundup http://www.dnsbl.com/2007/03/spamcop-roundup.html Spamcop BL: A blocklist with a hair trigger http://www.dnsbl.com/2007/02/spamcop-bl-list-with-hair-trigger.html Microsoft using Spamcop BL http://www.spamresource.com/2007/02/microsoft-using-spamcop-and-spamhaus.html My Problems with Spamcop http://www.spamresource.com/2003/03/problems-with-spamcop.html

Which DNSBLs work well?

This is a question I get quite often and it’s a tough one to answer. I don’t really bother with running my own mail system any more, as I’m tired of the headache and happy to leave the server-level spam prevention to somebody else.

And I'm tired of taking other peoples' word for it that a certain blocklist works well or doesn't work well -- I've been burned a number of times by people listing stuff on a blocklist outside of a list's defined charter. It's very frustrating. And lots of people publish stats on how much mail they block with a given list, which is an incomplete measure of whether or not a list is any good. Think about it. If you block all mail, you're going to block all spam. But you're going to block all the rest of your inbound mail, too. And when you block mail with a DNSBL, you don't always have an easy way to tell if that mail was actually wanted or not.

So, I decided to tackle it a bit differently than other folks have. See, I have my own very large spamtrap, and the ability to compare lots of data on the fly.

For this project, I've created two feeds. One is a spam feed, composed of mail received by my many spamtrap addresses, with lots of questionable mail and obvious non-spam weeded out. I then created a non-spam feed. In this “hamtrap” I am directed solicited mail that I signed up for from over 400 senders, big and small. Now, I just have to sit back, watch the mail roll in, and watch the data roll up.

For the past week or so, I’ve been checking every piece of mail received at either the spamtrap or hamtrap against a bunch of different blocklists. I wrote software to ensure that the message is checked within a few minutes of receipt, a necessary step to gather accurate blocklist “hit” data.

After that first week, here’s what I’ve found. It might be obvious to you, or it might not: Spamhaus is a very accurate blocklist, and some others...aren't. Spamhaus’s “ZEN” blocklist correctly tagged about two-thirds of my spam, and tagged no desired mail incorrectly. Fairly impressive, especially when compared to some other blocklists. SORBS correctly tagged 55% of my spam mail, but got it wrong on the non-spam side of things ten percent of the time. If you think throwing away ten percent of the mail you want is troublesome, how about rejecting a third of desired mail? That’s what happens if you use the Fiveten blocklist. It correctly would block 58% of my spam during the test period, but with a false positive rate of 34%, that would make it unacceptable blocklist to use in any corporate environment where you actually want to receive mail your users asked to receive.

One fairly surprising revelation is that Spamcop’s blocklist is nowhere as bad as I had previously believed it to be. I’ve complained periodically here about how Spamcop’s math is often wrong, how it too often lists confirmed opt-in senders, how it is too aggressive against wanted mail, but...my data (so far) shows a complete lack of false positives. This is a nice change, and it makes me very happy to see. Assuming this trend keeps up, I think you'll see me rewriting and putting disclaimers in front of some of my previous rants on that topic.