I’ve been doing some thinking. This means you might want to get to the nearest fallout shelter immediately.
Yesterday a fairly well known person who works for a very well known company contacted me about Bad Behavior. What’s not well known is that many of this very well known company’s customer-facing Web sites run WordPress. And, no, their sites don’t at all look like blogs. They must be the most complex, and most non-blog-like, themes ever designed for WordPress.
Anyway, who they are and what they do aren’t relevant. A few of you know, and it’s not a very well kept secret, but it’s irrelevant to the topic at hand, so I won’t be mentioning either the person’s name or the company’s name. (The person in question, who doubtless is reading this, can feel free to disclose it if he or she wishes, however.)
The topic of the day is Bad Behavior on very large sites. To date, this person’s test of Bad Behavior yesterday on this very large site makes the largest installation ever of any Web software I’ve ever written. So I’m proud of that. But the test was not without its problems, and Bad Behavior probably won’t be running on that site for a while.
The main problem that came up during the test is that one entire office of the company was blocked from access to the sites. Presumably for the company’s internal security or some such reason, I haven’t received any raw data from the test which would help me diagnose the problem, but I have been able to make some educated guesses.
Bad Behavior is known to be intolerant of some brands of Web content filtering software. These particular bits of software, if you’re stuck with one, will read a Web page before you do, and make an immediate decision on whether to allow you access to it. The problem arises because they feed the Web server a false user agent. Because Bad Behavior looks deeper than the user agent, it is easily able to tell that the request isn’t really coming from Internet Explorer, and does what it’s designed to do: it blocks the request.
Needless to say, if you block a Web content filtering program, it’s typically going to get annoyed, and block the user who’s stuck behind the filter. This is my current best guess as to what happened in yesterday’s test.
The problem for me, as the author of this spam killing software, is I can’t easily tell the difference between a Web content filter which pretends — poorly — to be Internet Explorer, and a spambot, which also pretends to be Internet Explorer. I can only tell that the request isn’t really coming from Internet Explorer, and presume the requesting user agent is up to no good.
The issue is that Web content filters feed Web servers a fake user agent for a different purpose. If the Web server knew the request was coming from filtering software, it could feed the filter clean, innocuous data. The filter, thus fooled, would then allow the user to access the site, even though it may contain pornography, black-hat hacking information, competitors’ job listings, or anything else the company has decided not to allow its employees to access on company time. Thus the content filter presents a fake user agent to the Web server.
Anyone responsible for the programming of any Web content filtering software, or for that matter just about anything with an HTTP client in it, should feel free to contact me, and I will immediately tell you exactly what your software needs to do to pass spambot filtering and properly maintain the fiction of being a real Web browser. And since several well-known link spammers read me and keep up with Bad Behavior, you also should feel free to contact me, and I will immediately tell you to go to hell.
Anyway, I’m back to Bad Behavior in enterprise settings. If you plan to run Bad Behavior in such an environment, the first thing to do is to wait. While I test myself, and have a few relatively high traffic sites who also test new versions of Bad Behavior before release, they can’t catch everything, sometimes we miss things, and occasionally a third party will do something that throws a monkey wrench into the works, such as the recent release of Google Desktop. Wait for the minor version to stabilize before deploying it widely. Changes in the third digit of the version number are now reserved for bug and security fixes, so follow it as closely as your IT policies permit.
The second thing to do is to whitelist your entire company’s internal networks. This especially means the RFC1918 addresses which most of you use extensively. They are already in the
bad-behavior-whitelist.php file; just uncomment them. I’m assuming, of course, that your internal networks are not a source of spam. If they are, you have more problems than I alone can solve.
Whitelist any scripts which you may need to access your site only if they fail. An example of this would be the W3C Validator. (It passes, however, and does not need to be whitelisted; it’s only an example.)
Also consider whitelisting any public IP addresses used by your company, its partners, its vendors, etc. I say consider, not just do it, because in some circumstances this may not make sense. For instance, if you can’t trust one of your vendors to keep its systems secure, you may not wish to whitelist them.
Finally, if you run into a problem you’re unable to resolve, or if you have any suggestions for improving Bad Behavior, contact me as soon as possible. I’ll do whatever I can to assist you. And if you’re a spammer, bend over; I’ve got something special for you.
Bad Behavior 1.2.1 has been released to address issues people are having with whitelists not working, and with Google Desktop causing users to be blocked from the site. Bad Behavior is the Web’s premier link spam killer, protecting blogs, wikis, forums and CMS systems all over the Internet.
Obviously we’re not all perfect. A ridiculous omission caused Bad Behavior’s new whitelisting feature to simply not work. In order for whitelisting to work, install Bad Behavior 1.2.1. In addition, Google’s new Google Desktop is sending invalid HTTP headers to every site its users visit, causing Bad Behavior 1.2 sites to blacklist them.
Access will be automatically restored to affected users within 48 hours of installing the update, or you can empty the bad_behavior_log table after installing the update to restore access to affected users immediately.
Both of these issues have been fixed, so download Bad Behavior now!
Oh, and the next time I release software for testing, TEST it!
If you’re trying to use IP-based whitelisting in Bad Behavior, and finding that it fails to allow users through in that IP address or range, please contact me immediately, and send a copy of the whitelist entry and the Bad Behavior logs from the database showing the users from that IP address or range being denied after you added the entry. Then delete from your database any records containing that IP address, and contact me again if the trouble recurs.
Do the same if whitelisting by user agent fails, but remember that the user agent must match exactly for it to be whitelisted.
Due to a bug in Google Desktop, Bad Behavior is blocking access to it when it tries to download users’ RSS feeds. I’ve sent a message to Google (though I don’t really expect much to happen) and I’ll see if I can have a workaround in place shortly.
Affected users will see “Web Clip Error: Unknown error” in the Google Desktop.
FeedBurner users who use the FeedBurner .htaccess redirects are not affected by this issue. (And since I’m one of them, I never noticed.)
I have a ticket [#32426362] from Google for this issue. If you are seeing this, you can contact email@example.com and place the ticket number, with the brackets, in the subject line, and let them know you are adversely affected by this issue. Also run the program located at http://desktop.google.com/DiagnoseGoogleDesktop.exe and include the diagnostic output that it gives in your message.
Okay, so I’ve had a chance to play with WordPress 1.6-ALPHA-2-still-dont-use out of SVN, and I’ve had a chance to play with WordPress.com. I think I have a half-baked idea of what’s going on, and I’m going to share it with you. Assuming anyone’s reading this, of course.
First of all, this new version of WordPress is bound to make blogging very nearly idiot-proof. Even an MSN Spaces user should be able to muddle their way through the streamlined, simplified administrative interface. It might still be too tough for AOL users and people trying to find a Wal-Mart job, though.
I suspect your average WordPress.com user is going to get their new blog, click Write, and start blogging, without spending much time — or any time — going through the numerous options. And that’s fine. You can add categories on the fly without even stopping to click Mangle. And with the new editor, you can even write posts without knowing a single bit of XHTML.
That covers about 95% of blogging for most people.
But the other 5% turns out to be a real sticky point.
At the moment, WordPress.com offers only a limited selection of themes to choose from, and the themes are not customizable. This gave me a real problem at first, as most of the themes have bugs or omit critical functionality. After testing out the available themes for the better part of an hour, I finally settled on this one, which doesn’t at all make me happy, or even look the way I’d like, but does have all the functionality working properly. As far as I can tell. For now. Even the prize-winning Connections theme omits the comments template on pages. In contrast, my WordPress 1.6 site lets me install any theme I want, customize the theme, and do whatever I need to do in order to have my blog look, feel and act exactly as I want it to.
Nor does WordPress.com allow the installation of plugins. In WordPress 1.6, I can install plugins to extend the functionality of WordPress itself, add new features, change the way things work, and a wide variety of different things. Indeed, my most well-known site has some 20 plugins installed. I think I’ve forgotten why.
Beyond themes and plugins, most of the core functionality of WordPress 1.6 is present in WordPress.com. A few things aren’t here right now. For instance, WordPress.com doesn’t let you set your local time zone offset, or change your permalink structure. The time zone thing is bothersome, but most people aren’t going to complain too loudly; times are (currently) displayed in UTC. And most people probably wouldn’t know that you could change the permalink structure unless you pointed it out to them.
There is one very good reason for WordPress.com to not permit users to install their own themes and plugins. That is security. Both themes and plugins can contain actual PHP code. This means that, in theory, a WordPress.com blogger could upload a theme and a plugin which lets him obtain unauthorized access to others’ blogs. Or worse.
I don’t think the security problem is insurmountable, though. After all, Web hosts let people run unknown/untrusted code all the time. For instance, my Web host uses the UNIX user security structure. By having the web server run my code under my user ID, rather than the server’s, my code can only access things that I legitimately have access to. Other users’ files are off-limits (assuming the other users haven’t explicitly granted the world access to them).
By incorporating a similar security structure into WordPress.com, it should be possible to allow users to run their own themes and plugins. And that will be the first white car in a world of black ones.
What should I do with this shiny new WordPress blog? Actually I think I have a pretty good idea what to do with it: finally get all that talk about WordPress off my other blog.