Lunacy Unleashed

Notes from the field in the War on Spam

I got slashdotted!

In case some of you weren’t aware, I was /.ed last night. For almost two hours after the posting, my site was unavailable, intermittently available, or very slow. After 24 hours this post constitutes a brief analysis of what went wrong and remedial steps I took — while the oncoming hordes were banging at the gate — to get my server back up and running. I also note some lessons learned for those of you wanting to drive lots of traffic to your sites, get popular fast, or just experience the for yourself. I also note some implications this experience has for Bad Behavior 2 development.

First a few raw numbers: In the first 24 hours, Apache recorded 30,787 visits from slashdot.org, 4,882 more visits from people with their referrer blocked, and 5,582 visits from other places. Of those hits, only 45 came prior to the post going live. (Subscribers to /. can see articles shortly prior to their publication time.) I had 27 minutes from the first referral from /. at 1:24 am to the time the post went live there at 1:51 am. (All times are UTC.)

Of those pageviews, 544 came between 1:51 and 2:01. There were 634 pageviews between 2:01 and 2:11, 658 between 2:11 and 2:21, and so on. Clearly the server should have been able to handle much more than one pageview per second. And this only counts pageviews served successfully; it doesn’t count images, CSS, MP3 files, etc. It also doesn’t count the innumerable requests dropped on the floor, or people who just gave up waiting.

The server started showing signs of trouble fast. By 1:55 am the load average had passed 35. By 2:00 it had passed 50. At one point I saw the load average as high as 112, and I was over 500MB into swap on a box with 1GB of RAM. I noted that accesses were going VERY slowly and realized that neither Apache nor MySQL had had much performance tuning, and could do a lot better than this.

Unfortunately, I am an idiot, and neglected to take any measures BEFORE the barbarians showed up at the gate! I’m doubly an idiot, because I called them by submitting the story to /. in the first place! So, here’s what went wrong, how I made it right, and actually managed to get the box serving requests again.

First off, the server platform is Fedora Core 5. It includes Apache 2.2.0, MySQL 5.0.21 and PHP 5.1.4. What can I say, I like the bleeding edge. Anyway, take the distro wars elsewhere. The point is, Apache and MySQL on this platform aren’t particularly well tuned for high performance, high traffic sites. So off to Google I went to try to get things under control.

I quickly determined that MySQL was spending way too much time creating threads to serve incoming requests. It also wasn’t dropping old connections quickly enough. So I set the following two variables:

set global thread_cache_size = 150;
set global wait_timeout = 10;

That got MySQL behaving mostly okay. Then I turned to Apache. It turned out to be a bit more of a problem, as at first glance, it already looked like it was fairly well tuned:

StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

So I twiddled the values for a while, with little result, all the while getting hammered with load averages hovering between 50 and 60, but at least I wasn’t swapping anymore. Then after over an hour and a half, I finally realized I had a big problem:

KeepAlive Off
MaxKeepAliveRequests 100
KeepAliveTimeout 15

Wait, OFF? That means Apache’s getting hit about 10 times as hard as it needs to be, as it will have to spawn a new process for every inline image, the CSS file, etc. So I changed it fast:

KeepAlive On
MaxKeepAliveRequests 1000
KeepAliveTimeout 10

And when I restarted Apache, now around 3:30 am, the load average quickly dropped from 45 to about 10, and requests started coming through at something approaching tolerable speed again.

Then I installed PHP eAccelerator, which has a nice Fedora package (php-eaccelerator) and works out of the box. When I restarted Apache after installing it, the load average dropped by half again, and the site was as fast as it usually is with nobody on it.

I’ve since installed WP-Cache 2 with the required WordPress 2.0/PHP 5 fix and Mark Jaquith’s gzip compression patch. I tested it to make sure it works, but I’m leaving it off until the next time the barbarians show up at the gate, as it screws with some dynamic code I have and I haven’t figured out how to get it to execute the code every time. Yet.

For those of you on shared hosting providers, you won’t be able to make most of these changes yourself. Only WP-Cache 2 is user-installable. If you use a VPS or dedicated server, though, you can do all of this.

There’s plenty of further performance tuning I can do, especially with MySQL, and I plan to do it in the very near future, just in case one of my posts actually gets greenlit on Fark.com or something.

One of the things I did while the server was being hammered was to disable Bad Behavior, to determine if it was putting too much load on the system while it was being hit with 50-100 requests a second. I’ve determined that at those levels it does hit the database pretty hard, and I plan to redesign all of Bad Behavior’s database usage to try to accommodate this sort of situation.

P.S. I can tell you that many /. users really do click on ads. Yesterday’s take on Google AdSense was $40.91, and so far today I’m above $66. Not bad. I get paid to learn fast about tuning my server. :)

There’s much more work to be done, though, so I’ll most likely have a follow-up to this.

About these ads

June 3, 2006 - Posted by | AdSense, Apache, Bad Behavior, MySQL, Slashdot, WordPress

11 Comments

  1. Haha, damn!

    Comment by Viper007Bond | June 3, 2006

  2. Er, hit submit by accident.

    Anyway, I should really implement some of this stuff. I run a site that often gets nearly 100k views a day from around 25k uniques and the load just hits the ceiling every time. :/

    Comment by Viper007Bond | June 3, 2006

  3. [...] Michael Hampton, creator of the excellent PHP-scripted Bad Behavior spam-fighting software (which, along with Spam Karma, has saved me from junk-comment migraines of a modest but repetitive nature), was slash-dotted a couple of days ago. Being that he had submitted the article himself, he admits to having a hand in his server’s subsequently pummelling: The server started showing signs of trouble fast. By 1:55 am the load average had passed 35. By 2:00 it had passed 50. At one point I saw the load average as high as 112, and I was over 500MB into swap on a box with 1GB of RAM. I noted that accesses were going VERY slowly and realized that neither Apache nor MySQL had had much performance tuning, and could do a lot better than this. [...]

    Pingback by Dead Reckoning » Archive » Courting the Masses? | June 3, 2006

  4. Isn’t it always the way that these things happen in the middle of the night, never at a time conveniant

    Comment by Greg | June 4, 2006

  5. Remember times are in UTC. It would have been early afternoon there in Australia. :)

    Comment by Michael Hampton | June 4, 2006

  6. Oh doh – so it was just me repeatedly clicking on the link then … ooops sorry about that :P

    Comment by Greg | June 6, 2006

  7. [...] Lunacy Unleashed » Blog Archive » I got slashdotted! (tags: apache performance wordpress mysql slashdotted) [...]

    Pingback by willkoca » Archive » links for 2006-06-16 | June 17, 2006

  8. [...] Lunacy Unleashed » Blog Archive » I got slashdotted! (tags: apache performance wordpress mysql slashdotted) [...]

    Pingback by willkoca » Archive » links for 2006-06-17 | June 17, 2006

  9. Congrats Michael on the slashdot and new release of BB

    Comment by Ajay | July 2, 2006

  10. [...] Bad Behavior 2 is faster than Bad Behavior 1, whether you use database logging or not. It has been completely redesigned from the ground up to be as fast as possible and provide protection on very high traffic sites, such as when you find yourself on the front page of slashdot.org, or you’re the sysop of Wikipedia. For most requests, Bad Behavior 2 issues at most one fast database query, and in many cases, no database queries. Bad Behavior’s run time on fast servers is measured in single milliseconds. [...]

    Pingback by Bloggers Buzz | July 5, 2006

  11. [...] Bad Behavior 2 is faster than Bad Behavior 1, whether you use database logging or not. It has been completely redesigned from the ground up to be as fast as possible and provide protection on very high traffic sites, such as when you find yourself on the front page of slashdot.org, or you’re the sysop of Wikipedia. For most requests, Bad Behavior 2 issues at most one fast database query, and in many cases, no database queries. Bad Behavior’s run time on fast servers is measured in single milliseconds. [...]

    Pingback by Lunacy Unleashed » Blog Archive » Bad Behavior 2 | July 9, 2006


Sorry, the comment form is closed at this time.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: