In case some of you weren’t aware, I was /.ed last night. For almost two hours after the posting, my site was unavailable, intermittently available, or very slow. After 24 hours this post constitutes a brief analysis of what went wrong and remedial steps I took — while the oncoming hordes were banging at the gate — to get my server back up and running. I also note some lessons learned for those of you wanting to drive lots of traffic to your sites, get popular fast, or just experience the Slashdot effect for yourself. I also note some implications this experience has for Bad Behavior 2 development.
First a few raw numbers: In the first 24 hours, Apache recorded 30,787 visits from slashdot.org, 4,882 more visits from people with their referrer blocked, and 5,582 visits from other places. Of those hits, only 45 came prior to the post going live. (Subscribers to /. can see articles shortly prior to their publication time.) I had 27 minutes from the first referral from /. at 1:24 am to the time the post went live there at 1:51 am. (All times are UTC.)
Of those pageviews, 544 came between 1:51 and 2:01. There were 634 pageviews between 2:01 and 2:11, 658 between 2:11 and 2:21, and so on. Clearly the server should have been able to handle much more than one pageview per second. And this only counts pageviews served successfully; it doesn’t count images, CSS, MP3 files, etc. It also doesn’t count the innumerable requests dropped on the floor, or people who just gave up waiting.
The server started showing signs of trouble fast. By 1:55 am the load average had passed 35. By 2:00 it had passed 50. At one point I saw the load average as high as 112, and I was over 500MB into swap on a box with 1GB of RAM. I noted that accesses were going VERY slowly and realized that neither Apache nor MySQL had had much performance tuning, and could do a lot better than this.
Unfortunately, I am an idiot, and neglected to take any measures BEFORE the barbarians showed up at the gate! I’m doubly an idiot, because I called them by submitting the story to /. in the first place! So, here’s what went wrong, how I made it right, and actually managed to get the box serving requests again.
First off, the server platform is Fedora Core 5. It includes Apache 2.2.0, MySQL 5.0.21 and PHP 5.1.4. What can I say, I like the bleeding edge. Anyway, take the distro wars elsewhere. The point is, Apache and MySQL on this platform aren’t particularly well tuned for high performance, high traffic sites. So off to Google I went to try to get things under control.
I quickly determined that MySQL was spending way too much time creating threads to serve incoming requests. It also wasn’t dropping old connections quickly enough. So I set the following two variables:
set global thread_cache_size = 150;
set global wait_timeout = 10;
That got MySQL behaving mostly okay. Then I turned to Apache. It turned out to be a bit more of a problem, as at first glance, it already looked like it was fairly well tuned:
So I twiddled the values for a while, with little result, all the while getting hammered with load averages hovering between 50 and 60, but at least I wasn’t swapping anymore. Then after over an hour and a half, I finally realized I had a big problem:
Wait, OFF? That means Apache’s getting hit about 10 times as hard as it needs to be, as it will have to spawn a new process for every inline image, the CSS file, etc. So I changed it fast:
And when I restarted Apache, now around 3:30 am, the load average quickly dropped from 45 to about 10, and requests started coming through at something approaching tolerable speed again.
Then I installed PHP eAccelerator, which has a nice Fedora package (php-eaccelerator) and works out of the box. When I restarted Apache after installing it, the load average dropped by half again, and the site was as fast as it usually is with nobody on it.
I’ve since installed WP-Cache 2 with the required WordPress 2.0/PHP 5 fix and Mark Jaquith’s gzip compression patch. I tested it to make sure it works, but I’m leaving it off until the next time the barbarians show up at the gate, as it screws with some dynamic code I have and I haven’t figured out how to get it to execute the code every time. Yet.
For those of you on shared hosting providers, you won’t be able to make most of these changes yourself. Only WP-Cache 2 is user-installable. If you use a VPS or dedicated server, though, you can do all of this.
There’s plenty of further performance tuning I can do, especially with MySQL, and I plan to do it in the very near future, just in case one of my posts actually gets greenlit on Fark.com or something.
One of the things I did while the server was being hammered was to disable Bad Behavior, to determine if it was putting too much load on the system while it was being hit with 50-100 requests a second. I’ve determined that at those levels it does hit the database pretty hard, and I plan to redesign all of Bad Behavior’s database usage to try to accommodate this sort of situation.
P.S. I can tell you that many /. users really do click on ads. Yesterday’s take on Google AdSense was $40.91, and so far today I’m above $66. Not bad. I get paid to learn fast about tuning my server.
There’s much more work to be done, though, so I’ll most likely have a follow-up to this.
People ask me about making money from AdSense all the time. While I usually will offer little tips and tricks that I’ve learned along the way, one thing I want to make sure that new AdSense publishers know is what NOT to do.
The number one thing that you should NOT do is STEAL OTHER PEOPLE’S CONTENT. Yes, I know the guy who sold you the video or the eBook said it was okay. Guess what, he has your $97 bucks, and you’re about five minutes away from being up shit creek without a paddle, as you lose your web hosting, your domain names, and most importantly, your AdSense account, all because you ripped someone off.
If I catch you stealing my content, your ass is grass. (This obviously doesn’t apply if I gave you permission to use it.)
This content was stolen from Michael Hampton.
Copyright © 2006 Michael Hampton. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.
“Your lack of faith is disturbing.”
Bad Behavior 2 Alpha 4 has been available for MediaWiki for a couple of weeks now, and yet I have received only one trouble report relevant to it: that being having to wait five seconds between edits. As I’m aware of at least one other serious problem with it that affects, as far as I can tell, every MediaWiki installation, I’m forced to conclude there is a lack of interest in combating automated wikispam.
Until someone shows me otherwise, work on the MediaWiki port will be suspended indefinitely.
Now if I’m wrong, and you ARE interested in further work on the MediaWiki port of Bad Behavior 2, then I need the following test results:
- Install Bad Behavior 2 Alpha 4.
- Browse to at least one page.
- Attempt to log in.
- Attempt to edit a page.
- Attempt to preview your changes.
- Attempt to commit your edits.
Wait at least 5 seconds between each test. Then mail me the test results, whether you succeeded or not. Be sure to include the version of MediaWiki you used. If some test failed, include any messages you saw in the browser, and anything that was logged in the server’s error_log.
If I actually get some test results mailed in, then work on the MediaWiki port will continue. Until then I am going to put my effort into the core and getting it ready for final release.
Bad Behavior 2 Alpha 2 is now available for wide testing. If you’ve used Bad Behavior in the past, or if you currently use Akismet or Spam Karma 2 and those spam numbers just keep going up, it’s time to learn what Bad Behavior 2 can do for you.
Bad Behavior 2 is a ground-up rewrite of Bad Behavior, the only Web spam killer which stops spammers before they even have a chance to get started. It does this by focusing not on the content of the messages, but on the delivery method. As such, for maximum effect, you should use it in conjunction with another content-based plugin, such as Spam Karma 2 or Akismet. But even on its own, Bad Behavior is once again shockingly effective at stopping spam.
When Bad Behavior was first introduced a year ago, (holy crap it HAS been that long!) it was the first tool of its kind targeting malicious activity on a wide variety of Web sites and platforms. While a few other similar solutions exist, such as mod_security for Apache, they can’t be installed by the user, and they don’t specifically target blog and forum spam, wiki vandalism and the like.
By contrast, Bad Behavior is a set of PHP scripts which pre-screens every request to your PHP-based Web site. The first major version of Bad Behavior was ported to nearly a dozen different blogs, wikis, forums and guestbooks, and many more generic ports were reported that their authors kept privately and never released. Bad Behavior 2 intends to keep the tradition of being portable to any PHP-based platform and expand on it by providing a more comprehensive and structured general API which can be wrapped into virtually anything.
Unfortunately, this wasn’t possible with the previous major version of Bad Behavior, owing to its design, thus the ground-up rewrite. Much to my surprise, Bad Behavior 2 is actually smaller than its predecessor, and catches virtually all spam with virtually no false positives. As of the time of this writing, it allowed only one spam to escape, and on investigation I found that spam had been manually posted by a very bored spammer. (In the final release, he too will be blocked.)
Now, down to business. As I said in the previous post, I haven’t completed the MediaWiki and ExpressionEngine ports yet, primarily due to time constraints, and the constraints of having thousands of people being hit by millions of spams and crying out for a solution now. So for now, this test release only runs on WordPress. It requires WP 1.5 or any later version.
Because this is a test release, there are some special installation instructions. First, if you installed 2.0 Alpha 1, delete it first before uploading this version.
This version can be installed alongside Bad Behavior 1, and in fact I recommend it. Upload the files in the usual way for any plugin. Then go to Manage Plugins. You’ll see both versions listed. Deactivate Bad Behavior 1, then activate Bad Behavior 2. To switch back, deactivate Bad Behavior 2, then activate Bad Behavior 1. Do not allow both version 1 and 2 to be active at the same time.
There are no show-stopping bugs that I’m aware of in this release; it’s stable enough for everyday use. However, it is not feature-complete; several items on the roadmap remain unfinished. For instance, a screener for requests which are suspicious but not certainly spam is only partially implemented. (Which is how that manual spammer got through.) The administrative screen located under Options > Bad Behavior is also not yet implemented.
Even so, I believe that this release will cut your spam flow on your WordPress blog to virtually nothing, without any false positives. However, in the extremely rare event that there is a false positive, the user will receive a technical support key and a brief explanation of what he can do to fix the problem (e.g. scan for spyware). Collect this key from the user and then mail it to me and I’ll get back to you with further information. The error page also provides a link the user can click for extended information; this part is also partially implemented and will be what I work on next.
And as always, if you find Bad Behavior valuable, please consider making a financial contribution. I develop Bad Behavior in my spare time, and every little bit counts.
I’ve said before that the time would probably come when I would ask for brave volunteers to help run test code in order to help me build the next generation of Bad Behavior. One of those times has just arrived.
In developing Bad Behavior, I need access to a much larger body (corpus) of spam than I currently have, and I need your help to collect it. So this test code will automatically send a copy of any spam you receive to me.
There are some qualifications for this test, however, and you will want to pay close attention.
First, the plugin compatibility requirements. You must already be running both Bad Behavior and Akismet, and NOT be running Spam Karma. (The test code just won’t work with Spam Karma, and it currently requires Akismet for screening missed comments.) You must have at least WordPress 1.5 or higher to play.
Second, the data privacy issue. In some countries you may need to disclose this to your readers, so I’m disclosing it to you. This bit of code leverages Akismet to determine what bits of spam Bad Behavior is missing, and when Akismet determines that a comment is spam, it sends me a copy of the spammy request. The problem is that like everything else, Akismet is not 100% perfect, and it is possible that I’ll receive a legitimate comment. When this happens, I will delete the copy I received.
Finally, the installation. This is just a repackaged copy of Bad Behavior 1.2.4 with the code in question enabled. Replace your existing copy of Bad Behavior with this copy, reactivate the plugin if necessary, and you’re done.
In all other respects it operates exactly as Bad Behavior 1.2.4, the current version, except that it sends me a copy of any comment/ping submitted that Akismet (and possibly other plugins, but not Spam Karma) marks as spam. With this body of information I will be better able to develop more advanced techniques to combat comment spam, reduce the need for other plugins, and possibly even eliminate the very few false positives. I’ve got a few other ideas in mind, but I don’t want to share them too early and allow the spammers any advantages.
Sorry, MediaWiki users; I don’t have something ready for you just yet. But stay tuned. I run MediaWiki also, and I’m very interested in helping you eliminate wikispam as well.
There’s too much stuff on your blog.
It’s okay, though. I’m not mad at you.
In fact, not only is there too much stuff on your blog, it’s poorly organized, difficult to see, and a real pain in the ass just to look at. And it’s not doing me any good when I visit your blog.
This rant came about as I was viewing one of my blogs on my new Palm T|X handheld, and trying to cut its download time down. This threw me into a whole new world: that of mobile computing. You see, on a mobile device, there’s very limited screen space, and anything more than minimal user input is a real pain in the ass. So the more stuff that appears on your blog, the worse off you are. And sidebars are the kiss of death.
But even without the constraints of the mobile devices, blog clutter and bad design are serious problems. Let’s take an example:
Now this blog has excellent content. Unfortunately, the blog’s design has several problems, all of which compound the others to make it very difficult to deal with.
First off, it has a color scheme with poor contrast. It uses a dark blue background, light blue links, and black borders. The effect of the color choices leads people to look not at the content, but at the borders! It takes an amazing amount of will to actually focus on the content, and to focus on links takes even more concentration. So the choice of colors does not naturally lead a reader to where the blogger presumably wants the reader.
Second, it uses a three-column layout. A three-column layout can be done well, but it rarely is. Instead, people usually use three-column layouts so that they can get many more links to many more places onto every page. That’s what this blog does.
What the hell is this crap? — Butt-head
What’s so wrong with lots of links to lots of places? Too much clutter. This blog contains no fewer than six blogrolls with literally hundreds of links to other blogs in its two sidebars, and in the format and colors used, they are all but indistinguishable. Who is really going to wade through all of those links in all of those blogrolls? It’s certainly important to promote one’s blog, and to help promote others, but at a certain point it becomes excessive, and nobody pays attention to it.
Or they do what I did the first several times I saw this blog, and others with similar problems: they leave without reading anything.
And then there are the ads. In the right-hand sidebar, one can see ads from Amazon and Google, but the ads are very poorly integrated into the site. So they are almost certainly getting much less attention than they otherwise would. This has a direct negative impact on the income this blogger makes from his blog.
Oh, and I have one more bone to pick, and that’s with those chicklets. You know, the little buttons inviting you to subscribe to every feed aggregator service you’ve ever heard of, and a few dozen you’ve never heard of. It’s been my experience that almost nobody ever clicks on them. As you can see, this person doesn’t seem to have had much luck getting people to subscribe to his RSS feed, despite being very well linked to. (You don’t get to be a Large Mammal in TTLB unless you’re fairly decent sized.) (And they could also be subscribed to his Atom feed, and not showing in that count, a side effect of using Blogger.) But the buttons, when all thrown together, are just plain ugly. I’ve theorized that one would get better results with just one or two buttons, and that seems to be playing out fairly well for me. Even if it doesn’t, my site looks a lot better for not having the chicklets.
After studying real users in the real world, I’ve found that they have a much better time with simpler, cleaner looking sites. So I’ve tried to keep the clutter and extra features to a minimum. Of course, with a blog, you have extensive navigational controls which are going to take up quite a bit of space. But all the rest can go, as I discovered. Or almost all.
Now pick up your PDA or smartphone and use its built-in Web browser to visit http://www.ioerror.us/ . Hopefully you do this after viewing it in your Web browser. If all goes well, you’ll see a radically different site; it’s been stripped of almost everything, is about five times smaller, downloads much faster, and dare I say it, I think I like it stripped down.
Perfection is achieved, not when there is nothing more to add, but when there is nothing more to take away. — Antoine de Saint-Exuper
What’s cluttering up your blog theme? Is it easy to read? What can you get rid of to improve your blog’s appearance and usability?
I had originally intended to have a second alpha release of Bad Behavior 2, the next generation of the Web’s only non-content-based link spam killer, ready by now. Actually by last week. So I wanted to give you all an update on why it’s delayed and when you can expect to see some code.
As I posted back in February, I wanted to have the next alpha release out by mid-March. That didn’t happen, and it’s starting to look like early April before I’ll have something out. The reasons for this are as follows:
First off, you all should understand that I don’t work a regular 9-to-5 job like most people. In fact, I haven’t since last summer. I live solely on the income that I make blogging and from performing WordPress and other programming work for various clients. And while Bad Behavior has many generous donors, one of whom helped me obtain a computer when I needed it most, it isn’t enough to live on. Because of this, the work which generates the income that I live on must always come first. Unless Bad Behavior becomes a lot more popular than it already is, it will likely always take a back seat to the other work I must do in order to pay the rent and buy the groceries.
This means blogging and slinging code for anyone willing to pay for it. Almost. I did tell a splogger to go to hell the other day, and probably lost a couple hundred bucks. But some things just aren’t worth it. I’m trying to eliminate these guys, not help them.
Anyway, enough of that. For the past few weeks, I’ve had several clients engage me for various things, and actually been able to pick up a halfway decent desktop computer as well. And I’ll be working for at least the next week on a couple of other projects. And then there’s whoever else comes along.
Once I’ve gotten all this paid work off my plate, and have enough money to live on for a couple of months, then I’ll return to Bad Behavior with a vengeance. I’ve seen the spammers who have managed to evade Bad Behavior. They’ve hit me as well. And they’ve hit hard. For the first time I can remember, Bad Behavior is less than 80% effective, and that just won’t stand. I’ll be back on the case shortly, just as soon as I’m reasonably sure that I can stop taking paid clients for a short while and still have enough money to live on.
If you have suggestions for Bad Behavior 2, please leave a comment.
(By the way, if Bad Behavior 1 has blocked you, your friends, or a robot you want to crawl your site, read this.)
Updated with new information and — by request — a tentative timeline.
As many of you are no doubt aware, there’s a new class of automated spambots out there which Bad Behavior and other spam tools don’t yet handle. Spammers have indeed adapted their techniques to get past tools such as Bad Behavior, Spam Karma, and Akismet, and are actually succeeding. I first caught wind of this new generation a few months ago, and began working on Bad Behavior 2, my attempt to deal with the new generation of malicious spambots.
I had hoped to have a final release long before now, but various problems cropped up and prevented me from completing the project. To date I’ve only been able to release a very early alpha so that other software authors who write code that depends on Bad Behavior can begin to update their tools. The alpha, while it’s functional, not only requires WordPress 2,0, but also provides no more protection than the current release code. In fact, it provides a bit less because one of the checks in Bad Behavior 1 is not present (right now).
As I’ve heard from every such author and they have either updated their code or have plans to do so, it’s time to move this forward.
A representative from a major open source project informed me that the project would be willing to contribute financially to Bad Behavior, but wanted to ensure that it would get something in return, and have a better idea of the timeframe of development. Thus I’m updating the previously posted roadmap.
I have the basic structure of Bad Behavior laid out to allow Bad Behavior to drop in much more easily into packages such as DotClear and Geeklog, where the plugin architecture is quite different than everything else. This will also allow Bad Behavior to be ported to even more software packages. It consists of two components: a core consisting of the test suite itself, and a glue component for each host platform. I’m also planning an administrative interface that will hook into each host platform, though I am not sure if this will be ready for all platforms at the time of release. Finally you’ll be able to configure Bad Behavior and view its activity within WordPress, MediaWiki, or whatever platform, using the native administrative interface provided by the host platform. This is the largest design change in version 2. (Estimated timeframe: 8 development hours per platform.)
Bad Behavior’s API needs improvement. It started as a simple generic interface, and has already outgrown that interface. Version 2 will feature a completely redesigned API for integration into the host PHP program, offering more flexibility, and hopefully the ability for the host program to provide services to Bad Behavior, such as statistics and log viewing.
Bad Behavior needs to deal with the database more intelligently. In version 1, I kept a log of requests which had been denied, expanded it to optionally include all requests, and expanded it again to include the reasons for denial. Then I started using the information in the log to make decisions. Version 2 will feature a complete redesign of the database table, and expansion into two tables, one strictly for logging (for you to stare at), one strictly for making decisions. I expect to gain significant performance improvements thereby, as well as being able to make more intelligent decisions on which requests should be allowed and which should not be. (Estimated timeframe: Complete.)
Most legitimate users unfortunate enough to see the Bad Behavior error page have no idea what to do, even though the page does provide suggestions. It needs to be shortened, clarified and contain links to expanded information sources so that users can solve the problem on their own whenever possible. It should also customize the message based on the specific reasons for denial. Though the ideal is that Bad Behavior should never present the page to a legitimate user — only to spammers. The architecture is in place for Bad Behavior to show more informative error messages, each one including a unique key which either the user or the blog admin can look up to determine what went wrong and how to fix it. While all of the keys have been set, the documentation for each remains to be written. Bad Behavior will now serve errors such as 400 and 403, depending on the request, rather than 412. (Estimated timeframe: 12 development hours.)
Bad Behavior needs to provide better tools for site administrators to search for and eliminate any false positives that may arise. While version 1 contains whitelisting capability, it’s not easy for a site owner to determine why a particular request was blocked, due to being unable to find it in the logs. As I mentioned above, Version 2 will provide a unique key to each denied request which the site owner can use to immediately find the problem, if any, and take any necessary corrective action.
In addition, in certain circumstances when the access is suspicious but it can’t be conclusively shown that the access is malicious, Bad Behavior 2 will attempt to clear the access using as-yet unreleased methods, only providing an error message if the access can’t be confirmed as a human being through multiple techniques. (I would say more on these, but I don’t want to give the spammers a head start on me!) (Estimated timeframe: 6 development hours.)
And I’m experimenting with automated methods of detecting spam attack runs which may originate from dozens of different IP addresses and have somewhat different signatures. I may call for some assistance with this in the near future, and this isn’t likely to make it into 2.0, but it is in the works. (Estimated timeframe: 75-100 development hours.)
Finally, Bad Behavior must continue to keep up with spammers as they attempt to adapt and find new ways to post their automated garbage. To date, this has been at most a minor issue, as there is only so much the spammers can do, while maintaining their high rates of spamming (10,000 or more posts in a single run is not unusual). Bad Behavior attempts to drive up the cost of link spamming, by blocking as many of those spammy requests as possible, forcing the spammers to resort to MUCH slower manual methods, or ideally, give up and find more honest work.
This is my vision for Bad Behavior 2.
Bad Behavior is open source software, released under the GNU General Public License, which you can find copies of all over the Internet, or included with the program. You don’t have to pay a cent to download or use it. However, developing it still costs me time and money, which is why it can go so long between minor releases. Unless (until) some cash comes in, it doesn’t get updated except in cases of dire emergency. Which only happens if I ship code with a typo in it, or Microsoft changes their search engine, or something like that.
Complicating the issue is the fact that a month ago, my laptop, which was my main development platform, died a horrible death. So I’ve had to suspend development of Bad Behavior as well as some of my blogging and other work, until I can get a replacement laptop and get it up and running. As of now I’m about $250 short of where I need to be to have a laptop which is inexpensive and yet capable of serving as a development platform. The computer I’ve been borrowing for the last month or so is simply not sufficient to do much with, unfortunately, and all of my paying endeavors have suffered for it.
If you think this roadmap looks good, and want to accelerate the development of Bad Behavior, contribute financially and I’ll be able to devote more time to it, meaning version 2 comes closer to reality sooner.
And for those of you who are concerned about actually getting something back, my commitment to you is for every $25 contributed, whether from one person or multiple people, to complete one hour of development work within the seven days following the contribution, even if I have to put something else on the back burner temporarily. (Since I generally charge roughly double this amount — or more — for most paid development work, you’re getting something of a bargain.) I hope to have a feature-complete beta working on WordPress, MediaWiki and ExpressionEngine within 45 days, and a final 2.0 release within 75 days, and with your help, this may actually come to pass.
And by all means, if you think I left something out that should be in version 2, please let me know. And yes, I know a lot of you are flat broke, so even if you are unable to contribute financially, leave a comment. Say hi, or suggest changes, or something, just so that I know you’re there and you think I should continue this project.