Lunacy Unleashed

Notes from the field in the War on Spam

Automattic Spam Stopper

Recently, Matt Mullenweg, creator of WordPress, had a bright idea on how to stop blog spam. He wrote up some code, distributed his new WordPress plugin to a small group of testers, and so was born the so-called Automattic Spam Stopper, or ASS.

I was able to obtain a copy of Automattic Spam Stopper for review and made a quite disturbing discovery, namely, how it works.

Whenever a user makes a comment to your WordPress blog, ASS forwards a copy of the entire comment, the metadata such as username, email address and URI, as well as your blog address and Web server environment variables, to a central server for analysis. The server then returns the response “true” if the comment is judged to be spam.

Mullenweg isn’t saying what the “secret sauce” is for the server, so as to frustrate the spammers. “By the time we’re done spammers around the world will quiver in their boots,” said Mullenweg.

So how does the server determine what’s spam? Users of the plugin submit copies of any spam they receive by marking them as spam in the WordPress administration panel. ASS then forwards copies of these to the server for analysis.

The submitted spam, however, remains in your database, but hidden from view. This could cause resource constraint (disk space) problems, and backup/restore problems, for many users, especially after time. WordPress does not automatically remove spam from its database, and does not provide any method for removing it from the database. A third-party plugin, however, does provide this function.

Right now Mullenweg inspects all comments submitted this way manually, before the server considers them to be spam. If he judges them to actually be spam, then they are added to the server’s corpus, or database of submitted spam.

He has not said, however, whether legitimate comments are kept on the server, or whether anyone else looks at the submissions. Thus, ASS may not be a good anti-spam choice for private blogs, or for blogs which frequently use password protection to limit access to their contents. In a very real sense it comes down to whether you trust Matt Mullenweg with your readers’ comments. Some people will, and others won’t.

Mullenweg envisions ASS as a service which is free for personal use, and paid for business use. “I would be more comfortable with something where it was free for regular people, and only businesses or enterprises paid (enough to support everybody),” he said.

“There may be ‘keys’ or accounts at some point to prevent abuse,” he said. “However the plugin and API are designed to be pretty easy to recreate, so if someone wanted to run their own spam [prevention] service they could easily.”

That much is true. I could create a server to do this in rather short time. And I almost did. It’s been an idea that’s been discussed before among WordPress anti-spam gurus, and ultimately rejected.

To date no one has been able to provide a centralized server solution which ensures the integrity of the database, for instance. Mullenweg ensures the integrity of his database by inspecting all comments manually, but this “solution” doesn’t scale very well, and is untenable once ASS is released to a wider audience. He has proposed that users be registered and receive keys in order to use the service, but even this doesn’t prevent spammers themselves from registering and submitting garbage to the database.

In addition, no one has been able to provide a centralized server solution which ensures the privacy of users whose comments are subject to this sort of analysis, especially with respect to private blogs and password-protected posts, where users expect their comments to be private. I’ve come up with an idea or two on how this might be done, but I’m not sharing until I’m certain it really can be done; if it were really that easy, it seems that someone would have done it already.

Now if Mullenweg can solve the problems of privacy, integrity, scalability, and those gigabytes of spam clogging up his users’ databases, he may be on to something. But everyone else who’s had this idea ultimately scaled it back or dropped it entirely. I fail to see how Matt’s ASS is any different.

In the meantime, if you’re looking to stop spam without compromising your users’ privacy, consider Bad Behavior, which is shockingly effective despite not looking at the content of comments at all, and Spam Karma, which does, but doesn’t send the whole comment, and much of your server information, off to who knows where.

Update: Some other reviews of Automattic Spam Stopper:

October 10, 2005 - Posted by | Bad Behavior, Blog Spam, WordPress, WordPress 1.6, WordPress.com

13 Comments

  1. This sounds similar in principle to Razor (the OSS side of Cloudmark), Pyzor and DCC… except all three of those services work by generating hash values locally and sending the hash values to the central database. What’s good about that approach is that it avoids the privacy implications. The message itself doesn’t leave your own server. What’s bad about it is that there’s no way to verify the report. DCC gets around this by declaring “bulkiness” to be the target and recommending that people whitelist everything they subscribe to, and Razor gets around this by requiring people to register an ID to submit samples, allowing registered users to revoke reports, and generating a trust rating for each user based in part on how many of that user’s reports get revoked.

    Comment by Kelson | October 10, 2005

  2. I would never propose to manually review everything, I’m just doing that in the extremely limited alpha test to iron out bugs in the algorithims. The final version will include human review only for reported problems. I’m fairly confident that the database will remain pretty high-quality because I’ve dealt with some of the craftiest spam poisoning already on Ping-O-Matic, which gets millions of spam pings a day.

    Your observations on the plugin are very valid, most of my time has been spent on the web service side and architecture. Would you be interested in working on the plugin? An auto-delete for old spam would be a great feature, as would a way to review marked spam and submit any false positives.

    Comment by Matt | October 11, 2005

  3. io_error, you and your cheap wording.

    I fail to see how Matt’s ASS is any different.

    HURR HURR HURR HURR

    Comment by VxJasonxV | October 11, 2005

  4. Matt, I’m still a big fan of eliminating as much spam as possible on the front end, before it even needs to hit a server or be saved in a blogger’s database. It’s much easier to go through three spams and find one potential false positive than it is to go through 485 spams.

    I of course understand your need to get as many samples as possible during this test, but in production I’d recommend bloggers using this plugin have some front end protection as well, otherwise spam management is still going to be something of a nightmare.

    In other words, in the grand scheme of things, this plugin makes Spam Karma and similar solutions redundant, not Bad Behavior, which is still going to be a vital piece of spam protection for most WordPress users.

    That said, an interface to allow users to manage the spam in the database is something WordPress has needed since 1.5 was originally released. I’d suggest that this should go in core, rather than in a plugin. (The plugin, of course, can go talk to your server when a comment gets re-moderated not spam.)

    I wouldn’t mind helping with the plugin, but I don’t think there’s a whole lot left to do.🙂

    Comment by Michael Hampton | October 11, 2005

  5. I’ve found Chris’ Spam Nuker to be extremely helpful.

    Comment by skippy | October 11, 2005

  6. Well I was originally using Bad Behavior in several very high profile enviroments but the false positive problem was what made me stop. I think with any solution you are going to have the potential for false positives, the kicker is how you deal with them. I actually really like SK’s triggered captcha for boderline comments in that regard.

    BTW, I think I’m going to scrap ASS as the name, I’m just not comfortable with having that much out there.🙂

    Comment by Matt | October 11, 2005

  7. Matt, I talked a little bit about your false positive problem and some ideas on solving it a while back. It’s most certainly not something that will affect most people using WordPress.🙂

    Comment by Michael Hampton | October 11, 2005

  8. Spam Karma 2 is for me. Used for 2 weeks and it’s caught 700+ comment and trackback spam. I haven’t had to get involved with it since I started using it.

    I love it.

    Comment by tyler | October 11, 2005

  9. BTW, I think I’m going to scrap ASS as the name, I’m just not comfortable with having that much out there.🙂

    Yeah… I wouldn’t be either.
    I’m just not that kind of guy.

    (This entry and it’s comments are just too much. XD)

    Comment by VxJasonxV | October 12, 2005

  10. As of now I’ve got Bad Behavior and SK2 installed and it does keep my blog pretty spam free.

    I occasionally add ips to my .htaccess for blocking based on what SK2 reports (usually open proxies) which would block them even before BB🙂

    Comment by Ajay | October 12, 2005

  11. […] To make a significant dent in this type of spam, blog software needs to ship out-of-the-box with better spam controls. WordPress author Matt Mullenweg attended the second annual web spam summit and now has something cooking. I’ve reviewed Matt’s solution, the Automattic Spam Stopper. […]

    Pingback by Lunacy Unleashed » Wall Street Journal interview on blog spam | October 12, 2005

  12. […] Installed Matt’s Automattic Spam Stopper […]

    Pingback by karmasaya.info blog » Blog Archive » Automattic Spam Stopper | October 17, 2005

  13. […] Last week I told you all about Automattic Spam Stopper, the new anti-spam solution for WordPress from Matt Mullenweg. There’s been some new news, and you’re going to hear it here first. […]

    Pingback by Lunacy Unleashed » Automattic Kismet | October 26, 2005


Sorry, the comment form is closed at this time.

%d bloggers like this: