Lunacy Unleashed

Notes from the field in the War on Spam

Bad Behavior 2 Beta 1

Make a Donation.

First I want to say thank you to everyone who tried out an alpha version of Bad Behavior 2. Your valuable feedback and comments have resulted in a tool which eliminates some 99% of spam long before you would ever have to see it. And that means much less time spent cleaning out comments and reverting edits.

Based on your feedback, and on my own experience getting slashdotted last week, I’ve changed the pre-release quite a bit from previous pre-releases and it’s now ready for a wider audience. Here’s a quick rundown of the changes:

  • Trackback spam is pretty much dead. If you see a trackback spam get past Bad Behavior, I want to know about it.
  • Bad Behavior is stopping 99% or more of comment spam and an unknown amount of automated wiki vandalism. (I have no chicken to measure it.)
  • A check which required waiting five seconds before submitting POST requests has been removed. While it showed some benefit in stopping spam, it was unduly interfering with legitimate activity.
  • A check for misconfigured proxy servers has been disabled. While it blocked quite a bit of spam, it also blocks many corporate and government users, not to mention the entire country of Singapore. This appears to be a Microsoft ISA Server bug or misconfiguration, and when someone tells me how to fix it, this check will be re-enabled.
  • Several additional checks for spam and malicious activity have been added.
  • Database logging has been revamped, and the verbose option reinstated. When verbose is off, only blocked requests and some suspicious requests will be logged. On most requests, with verbose option off, Bad Behavior will make only one database query (to retrieve its settings).
  • On WordPress, the administrative screen has been expanded. You can now turn verbose mode logging on or off from this screen.
  • Once again, strangely enough it seems to be even faster than previous versions.

Some issues remain. I plan to implement a special page for MediaWiki, but I need some help from someone who is familiar with MediaWiki internals on implementing both the special page and the ability to save options. Please e-mail me if you have this knowledge.

I also plan to complete a technical support page both within WordPress and MediaWiki so that administrators can look up both missed spam and false positives. This should be complete prior to final release.

As always, I still need people to run the code, make sure it’s letting everyone through, and stopping spam. If it fails to catch spam, or blocks someone without good reason, then I need a report.

Now, on to installing it! Since people got confused last time, I’m going to break this into separate sections for WordPress and MediaWiki. But there is something common to both:

You will need to REMOVE all prior versions of both Bad Behavior 1 and Bad Behavior 2 BEFORE installing this release, because those versions may interfere with this one if left in place.

Then you need to DROP the *bad_behavior table from your database BEFORE installing this release, because the table format has changed. You can do this from within phpMyAdmin, for instance. (For instance, wp_bad_behavior or mw1_bad_behavior.)

Then you’re ready to install Bad Behavior 2 Beta 1. Follow the directions for your platform.

WordPress: The plugin installs just like any other plugin. Unzip it and you’ll have a Bad-Behavior folder. Upload the ENTIRE folder and its contents into your wp-content/plugins folder. Then activate the plugin from the Plugins administrative page. Once activated, you can edit its settings from the Options » Bad Behavior page.

MediaWiki: The extension installs just like any other extension. Unzip it and you’ll have a Bad-Behavior folder. If you want to edit the settings, edit the Bad-Behavior/bad-behavior-mediawiki.php file, find the text “Manually adjust settings here” and you can change them on the next line.

Upload the ENTIRE folder and its contents into your extensions folder. Then add the following to the end of LocalSettings.php:

include( 'extensions/Bad-Behavior/bad-behavior-mediawiki.php' );

And you’re done.

The to-do list is pretty short, though it’s possible I’ve forgotten something. If I did, please leave a comment below.

WordPress: Implement the database search facility on the Options > Bad Behavior admin screen.

MediaWiki: Implement the special page. Implement the ability to save options.

ExpressionEngine: Targeted for next alpha/beta release.

Generic/Third Party Ports: Should be possible now, but I don’t have a generic template ready yet.

And as always, if you find Bad Behavior valuable, please consider making a financial contribution. I develop Bad Behavior in my spare time, and every little bit counts.

Download Bad Behavior Now!

And don’t forget to subscribe to the RSS feed or the mailing list. (They’re the same content.)

Advertisements

June 7, 2006 Posted by | Bad Behavior, Blog Spam, MediaWiki, Spam, WordPress | 47 Comments

I got slashdotted!

In case some of you weren’t aware, I was /.ed last night. For almost two hours after the posting, my site was unavailable, intermittently available, or very slow. After 24 hours this post constitutes a brief analysis of what went wrong and remedial steps I took — while the oncoming hordes were banging at the gate — to get my server back up and running. I also note some lessons learned for those of you wanting to drive lots of traffic to your sites, get popular fast, or just experience the for yourself. I also note some implications this experience has for Bad Behavior 2 development.

First a few raw numbers: In the first 24 hours, Apache recorded 30,787 visits from slashdot.org, 4,882 more visits from people with their referrer blocked, and 5,582 visits from other places. Of those hits, only 45 came prior to the post going live. (Subscribers to /. can see articles shortly prior to their publication time.) I had 27 minutes from the first referral from /. at 1:24 am to the time the post went live there at 1:51 am. (All times are UTC.)

Of those pageviews, 544 came between 1:51 and 2:01. There were 634 pageviews between 2:01 and 2:11, 658 between 2:11 and 2:21, and so on. Clearly the server should have been able to handle much more than one pageview per second. And this only counts pageviews served successfully; it doesn’t count images, CSS, MP3 files, etc. It also doesn’t count the innumerable requests dropped on the floor, or people who just gave up waiting.

The server started showing signs of trouble fast. By 1:55 am the load average had passed 35. By 2:00 it had passed 50. At one point I saw the load average as high as 112, and I was over 500MB into swap on a box with 1GB of RAM. I noted that accesses were going VERY slowly and realized that neither Apache nor MySQL had had much performance tuning, and could do a lot better than this.

Unfortunately, I am an idiot, and neglected to take any measures BEFORE the barbarians showed up at the gate! I’m doubly an idiot, because I called them by submitting the story to /. in the first place! So, here’s what went wrong, how I made it right, and actually managed to get the box serving requests again.

First off, the server platform is Fedora Core 5. It includes Apache 2.2.0, MySQL 5.0.21 and PHP 5.1.4. What can I say, I like the bleeding edge. Anyway, take the distro wars elsewhere. The point is, Apache and MySQL on this platform aren’t particularly well tuned for high performance, high traffic sites. So off to Google I went to try to get things under control.

I quickly determined that MySQL was spending way too much time creating threads to serve incoming requests. It also wasn’t dropping old connections quickly enough. So I set the following two variables:

set global thread_cache_size = 150;
set global wait_timeout = 10;

That got MySQL behaving mostly okay. Then I turned to Apache. It turned out to be a bit more of a problem, as at first glance, it already looked like it was fairly well tuned:

StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

So I twiddled the values for a while, with little result, all the while getting hammered with load averages hovering between 50 and 60, but at least I wasn’t swapping anymore. Then after over an hour and a half, I finally realized I had a big problem:

KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 15

Wait, OFF? That means Apache’s getting hit about 10 times as hard as it needs to be, as it will have to spawn a new process for every inline image, the CSS file, etc. So I changed it fast:

KeepAlive On
MaxKeepAliveRequests 1000
KeepAliveTimeout 10

And when I restarted Apache, now around 3:30 am, the load average quickly dropped from 45 to about 10, and requests started coming through at something approaching tolerable speed again.

Then I installed PHP eAccelerator, which has a nice Fedora package (php-eaccelerator) and works out of the box. When I restarted Apache after installing it, the load average dropped by half again, and the site was as fast as it usually is with nobody on it.

I’ve since installed WP-Cache 2 with the required WordPress 2.0/PHP 5 fix and Mark Jaquith’s gzip compression patch. I tested it to make sure it works, but I’m leaving it off until the next time the barbarians show up at the gate, as it screws with some dynamic code I have and I haven’t figured out how to get it to execute the code every time. Yet.

For those of you on shared hosting providers, you won’t be able to make most of these changes yourself. Only WP-Cache 2 is user-installable. If you use a VPS or dedicated server, though, you can do all of this.

There’s plenty of further performance tuning I can do, especially with MySQL, and I plan to do it in the very near future, just in case one of my posts actually gets greenlit on Fark.com or something.

One of the things I did while the server was being hammered was to disable Bad Behavior, to determine if it was putting too much load on the system while it was being hit with 50-100 requests a second. I’ve determined that at those levels it does hit the database pretty hard, and I plan to redesign all of Bad Behavior’s database usage to try to accommodate this sort of situation.

P.S. I can tell you that many /. users really do click on ads. Yesterday’s take on Google AdSense was $40.91, and so far today I’m above $66. Not bad. I get paid to learn fast about tuning my server. 🙂

There’s much more work to be done, though, so I’ll most likely have a follow-up to this.

June 3, 2006 Posted by | AdSense, Apache, Bad Behavior, MySQL, Slashdot, WordPress | 11 Comments