Lunacy Unleashed

Notes from the field in the War on Spam

Wall Street Journal interview on blog spam

Last week a reporter from the Wall Street Journal emailed me to set up an interview for an upcoming article on the problem of blog spam. Now the first thing that came to mind is, “What does the Wall Street Journal care about blog spam?”

You may as well ask, what do they care about the Internet? Jeff Jarvis, who’s been around this particular block a few times, will tell you that “old media” needs to start caring about the Internet.

Virtually every company around now has an Internet presence, but these days that isn’t enough. Customers expect to be able to have an actual back-and-forth conversation with the companies they do business with online, and that’s where blogs come in. Companies that set up blogs now, and actually engage their customers online, are going to be the ones who survive the next big Internet shakeout. (See also “Web 2.0.”)

Anyway, the reason the Wall Street Journal cares is because its readers, typically corporate executives, investors, etc., either already care, or desperately need to start caring about blogs and blogging.

Which brings me to the interview, which thanks to some problems with my cell phone, didn’t actually take place. As of this writing the article hasn’t been published either, so maybe I’ll hear from him. Or maybe not.


Since you’re reading a blog, you probably already know something about blogging. I won’t bore you describing that. But you might not have seen blog spam.

Most blogs allow readers to leave comments and feedback on each individual entry, as well as trackbacks, which are automated notifications from other blogs sent when one blog references another. The comments and the trackbacks are then posted to the original posting. In this way is the blogosphere built.

At some point, spammers noticed this, and began developing automated methods of posting links to their own websites to blogs, using both the comment and trackback mechanisms. They have two goals in doing so: first and foremost, to drive traffic to their sites and income to their pockets, and second, to increase the search engine rank of their sites.

Early this year Google introduced a new standard called nofollow which blogs could apply to deny spammers search engine rank. Google, Yahoo and MSN all implemented the standard, as did most major blog, wiki and CMS platforms. But nofollow only addressed the secondary purpose, not the primary purpose, of blog spam, so nofollow hasn’t delivered on its promise to stop spam.

For WordPress, in the beginning, was Spam Karma. Spam Karma is a most excellent piece of software that does indeed block just about every piece of spam a blog might ever receive. It has one significant drawback, though: the spam sticks around, and you, the blogger, still have to deal with it. Spam Karma mails out digest e-mails with a summary of the spam caught, at least oncce per day. But get 50 or more, and you’ll get more than one e-mail. I’ve spoken to a blogger who received dozens of these e-mails daily, representing hundreds of spams being caught. Every day.

That’s a lot of work for anybody to do, let alone a blogger who just wants to write.

So, back in April, after the announcement of a WordPress plugin competition, I decided to do something that would stop spam. I had a completely novel idea which, as far as I could find, had never been tried before, and within a few days, had some working code. I tried it on a few guinea pigs, and it seemed good. And on the 24th April, the first release candidate of Bad Behavior went out the door.

It was a huge success, even far beyond my expectations.

Going in, I decided that it would be sufficient to stop most, if not all, spam, as long as there were absolutely no false positives, i.e. real people being blocked out. To this day Bad Behavior has kept this primary design goal.

It blocks between 90% to 99% of incoming spam before the blogger even has to think about spam, and on a very popular site, this is a lot of spam. Bad Behavior is running on sites which receive thousands of spam attempts daily, and blocks virtually all of them. The few messages which do get through are easy enough to deal with. On most sites, this may be one or two messages a week; on the busiest sites, five to ten a day. Compare that to 200 to 15,000 attempts a day, and you see the difference.

And, it doesn’t bother anybody with digest e-mails, or summaries, or even how many spammers it’s blocked. “I love how it is completely automated. No user involvement needed,” said Mark Jaquith. But some people want to know what it’s doing. This led one blogger to build a statistics plugin for Bad Behavior.

When I started this, I really had no idea what would happen. But it’s become a long-term project. I don’t see any end to blog spam anytime soon, and so I don’t see any end to Bad Behavior anytime soon.

The simple truth of the matter is, there are too many unprotected blogs out there. Technorati reports that only 55% of blogs have a post made in the last three months at any given time, a statistic they say is “consistent throughout the last year.” That means 45% of the 14 million weblogs out there have been virtually abandoned by their authors.

It’s primarily these blogs that spammers target.

To make a significant dent in this type of spam, blog software needs to ship out-of-the-box with better spam controls. WordPress author Matt Mullenweg attended the second annual web spam summit and now has something cooking. I’ve reviewed Matt’s solution, the Automattic Spam Stopper.

I’m just pissed off that I didn’t hear about the summit until it was over; I could easily have attended.


October 12, 2005 - Posted by | Bad Behavior, Blog Spam, WordPress



    Comment by UNKNOWN | June 19, 2006

  2. The Süddeutsche writes in the article at that the carrier of the blog search engine Technorati estimate that there are 70.000 new spam blogs daily! This are 25.550.000 a year!

    Comment by Hans | June 26, 2006

Sorry, the comment form is closed at this time.

%d bloggers like this: