Anatomy of a blog hack

So, last weekend I found out that my blogs had been hacked.

Actually, it wasn’t just my blogs, nothing personal involved or anything like that: the shared server space my sites were hosted on was compromised, and a good number of other sites and files were hacked as well. Based on what I can piece together, here’s what happened:

There were a number of sites on this hosting space that were running out-of-date versions of WordPress, and some that also had various other PHP code installed (NetOffice, Gallery 2, a few others). Any software that is outdated is potentially at risk to known exploits, but more worryingly, I found an old bit of PHP code on the server that was set up to run arbitrary PHP code for (I presume) some back-end admin processing, and ultimately I think this was what had been exploited.

And until I had found and killed this code, the exploit happened at least 3 times even as I was cleaning up the server.

The exploit itself, once I knew what to look for, was fairly simple:

  • In PHP files that were writable to the Apache webserver process, the code was altered so that any line containing an opening PHP tag (which tells the server to start executing the code after it as dynamic PHP until the closing tag is reached) looked something like this:
    From <?php .....
    To: <?php     eval(base64_decode('malicious code encoded here')); .........
  • When I copied this code to a sandboxed PHP environment and decoded it, it contained fairly simple instructions:
    • If the visitor to the site was coming from a Referrer—in other words, if they had clicked on a link from another site like Google search results, Facebook, someone else’s blog—they were redirected instead to a completely different site that presumably contained spam, or malware, or whatever.
    • If the visitor was coming to the site directly—they had typed the URL directly into the browser’s Location bar, or clicked on a bookmark—then they were passed on through to the site.
Because I normally type in URLs to my blogs directly, or click the “recently visited” link in Chrome’s list, I didn’t see the exploit at first. But as I was writing a blog post on The Brew Site on Friday the 20th, I was searching out a link to a previous blog post (gotta love Google for that) and when I clicked that link to pull up the earlier post, I was redirected to some site in Poland (or at least, with a Polish country code for the top-level domain).

Fortunately, I don’t believe this hack was in place for long, since I often search out links in this manner and would have noticed sooner: Sometime in the wee hours of the morning of January 19th was when the files were first modified is the earliest I can determine.

It took me a bit of time to figure out the exploit (at first I was thinking it was the Google 302 hijacking exploit), but once I did I was cleaning up files on my blogs by Saturday morning. I hadn’t yet had the chance to address the (many) other files and old sites on the server hosting space, so unfortunately my blogs got re-infected at least once more before I was able to kill the old files and update others. Most of my weekend (and part of the following week) was spent updating, fixing permissions, cleaning, and deleting files and sites.

For reference, a handy pattern for detecting this code in grep is:

grep -R -l 'eval(base64_decode(' *

(This should always work because you should never have similar PHP running in your legitimate code…)

Now, I keep my WordPress blog software (and installed plugins) up-to-date pretty religiously, and I try to keep permissions set appropriately. But a good number of files in each blog were infected even so—how? It turns out, even though a fair number of the core files that were originally installed (manually) had the correct Unix group (“<account>:users”) and permissions of 644 (rw- r– r–) and were untouched, I was also making liberal use of WordPress’s built-in auto-updating feature, along with automatic plugin installation, and at some point the files that WordPress were updating got set to the “nobody:users” group—the Apache webserver process. It was these files that were exploitable to the “nobody” Apache process that was being exploited by the other code on the server. (Along with the few files I had set to group-writeable as well.)

So, lesson learned. I’ve battened down the hatches, fixed the permissions on all the files in my sites, and have decided to forgo WordPress’s auto-installing and update features for now for good measure. And, I’ve finished up a (long overdue) move of my blogs to a new webhost with none of the legacy code possibilities that were extant on the original server. (Nothing against the original web hosting provider, I just needed a clean break with an affordable price.)

Of course, you all let me know if you still run into any problems, okay?

Tools of the trade

It’s been awhile since I’ve posted anything overly technical here, but it strikes me that a “snapshot” of what I do (for work) and how I do it (the tech) might be useful to some.

What I do is web development for Smart Solutions here in Bend. Smart Solutions is a web and software development company and the company essentially has three main divisions: custom software development, SEO (search engine optimization—I know, that’s another post), and web development. All these “divisions” work pretty closely with each other—there’s a lot of line-blurring, actually—but for the most part developing websites for clients is what I do.

The platform we develop for is Pixelsilk—the custom Content Management System (CMS) that Smart Solutions developed from the ground up (and is still developing). The marketing pitch is, it’s SEO-optimized, gives you full control of your HTML, gives you all the tools you need to interact with social media, etc. etc. etc. Move past all that and get to the meat of it, and the primary things I really like about Pixelsilk is that you interact with all of your content and data inside of the system (rather that working with offline files that need to be FTP’d to various places), there’s a powerful and comprehensive Javascript API (giving me the capability to extend the system in new ways), it gives you the ability to re-use code and libraries, and that it’s entirely web-based—-meaning I can work on a site from any browser.

I’m also the company’s defacto WordPress developer—yes, we host WordPress blogs in addition to Pixelsilk sites—and a few other PHP applications so I still get a chance to flex my PHP muscles every now and again. (Smart Solutions is otherwise a Microsoft and .NET shop.)

Of course, I use a number of additional tools to develop for the web, and that’s what this post is really about.

What I use is a mishmash of online and offline tools. In the “offline” category I make use of:

  • The GIMP, open-source graphics software. Free to download, and fairly powerful, there’s still a lot I’m learning about it, but I do most of the graphics work I need to accomplish with it. (Photoshop is the standard for the company, but I’m not versed in it.)
  • Microsoft Visual Studio, various flavors. Sometimes moving the HTML/Javascript/CSS into an editing tool is easier to deal with, and I frankly like the Visual Studio editing environment best of the various programs I’ve tried for these types of files.
  • PHP Designer. I actually use the (older) free version because, well, it’s free and does what I need, it’s fairly lightweight, and it has the same kind of keyboard mappings and editing environment as Visual Studio.
  • Notepad. Yes, a stripped-down plain text editor. You’d be amazed at how much I have this open.
  • FileZilla. Yes, sometimes you still need an FTP client, and FileZilla is a good free Windows client.
  • PuTTY. A great free SSH client, because I spend a non-insignificant amount of time on a *nix command line.
  • Apache/PHP/MySQL: Installed on my Windows boxen as test environments. Pretty critical especially when developing WordPress themes.

Online:

  • Google Chrome and Mozilla Firefox as my primary browsers. I actually use Chrome as my primary while at work and Firefox while at home; these are both highly standards-compliant web browsers and I know if I can get something to work properly in them, then that is in fact how it should work. Chrome has some great built-in development and inspection tools; in Firefox I employ a number of extensions.
  • Web Developer (Firefox plugin): A variety of pretty essential additions in toolbar format for all aspects of web development.
  • Firebug (Firefox plugin): Probably the #1 plugin I would recommend; it adds code inspection, network information, Javascript debugging and inspection, and all manner of incredibly useful tools—you can’t be a proper web developer without this installed. (Chrome’s built-in tools come pretty close to this.)
  • Page Speed (Firebug add-on): A fantastic add-on to Firebug that analyzes the overall page performance (using Google’s recommended benchmarks/tests) and gives you hints on what you can improve.
  • Header Spy (Firefox plugin): Shows HTTP headers on the status bar, useful for troubleshooting server information.
  • AFOM (Firefox plugin): Incredibly useful plugin for the Windows version of Firefox which fixes the memory leak prone to Windows Firefox.
  • Internet Explorer: Of course, you can’t develop for the web without checking your work in IE, and IE8 has a decent set of developer tools built-in—including the ability to switch between IE7, IE8, and Quirks modes.
  • W3C Validator: Because you want to make sure your site code validates and works properly, right?
  • jQuery: The best Javascript library out there. If I’m doing anything in Javascript these days, 99% of the time it’s using jQuery.

There is of course other tools I use that fall primarily under the heading of “my own sites” and are not necessarily web development per se: Google Analytics and Google AdSense are two examples. That’s probably another post.

This list is likely incomplete—I may have missed an item or two or three, and if I think of any I’ll update it. But this gives an idea of the various tools I’m employing currently and to a large extent what I’d consider the minimum number any good web developer should be using these days.

Friendster goes PHP

An item I saw yesterday but forgot to blog about: Friendster goes PHP. Pretty cool.

Finally on Friday we launched a platform rearchitecture based on loose-coupling, web standards, and a move from JSP (via Tomcat) to PHP. The website doesn’t look much different, but hopefully we can now stop being a byword for unacceptably poky site performance.

I haven’t had much of a chance yet to use Friendster to see if it truly is faster, so I can’t personally comment on that aspect. And predictably, this is going to bring all sorts of people out of the woodwork arguing over the relative merits of Java/JSP (which was old Friendster) versus PHP… just look at the comments on the link above to see it already happening. And while debate and disagreement can be healthy and productive, how about a quick reality check to everyone:

PHP is good. Java is good. Both have their merits and disadvantages. Loudly complaining that [Java|PHP] is the only true way and the other is crap is boring and uninformed.

MT Comment

What with the current brouhaha over Movable Type‘s licensing and payment scheme for the version 3 software (what, you want a link? Feh, go Google it), all I can really say is, damn it’s sure handy to have written my own system. :)

I notice that a lot of people are seriously considering migrating to WordPress. That’s cool, it uses PHP and seems pretty solid.

Conspiracies in Web Tracking

Despite my headline, I’m not really going to go all Mulder on you and start ranting about Big Brother and privacy issues and all that. Instead it’s just some thoughts I’ve been entertaining lately on technology and tracking people and habits on the Web. Some people may choose to see the things I’m writing about as conspiratorial, and that’s fine for them; they may not want to read on, though :) . Continue reading

Searching and Minimum Word Length

Mike Boone, in the comments section of yesterday’s entry on searching (“Updated Search“), correctly points out that searching my site for a word that is less than four characters in length (like “php” or “cow”) does not work—no results are returned. Obviously, since I write about PHP on occasion, this is untenable.

The problem is that MySQL‘s fulltext indexing, by default, only indexes words greater than three characters long, and I don’t think I have any way to change this, despite my initial reply to Mike’s comment. This site is running on a shared server setup on pair.com, and I have absolutely zero control over the MySQL server configuration. I might post a question to their tech support, but I’m not overly optimistic about the response. So, what to do?

Short term, here’s my solution (though it’s not implemented yet): examine each word in the search string, throwing out stopwords (like “the,” “and,” “so,” etc.), and for any word shorter than four characters long, do a LIKE search against the content for them. No, it’s not ideal, but it’s a patch. Comments?

PHP Development Hint

Here’s a general hint for PHP development: A quick and easy way to check for syntax or compile errors without uploading the PHP script to the Web server and testing online through a browser is via the command line. It’s obvious, and I don’t know why I didn’t think of this sooner, but I’ve been doing more and more of it lately.

I develop primarily under Windows (with PHP installed) and upload to a Unix-variant server, and this what I’ve been doing to run a PHP script on the command line on my Windows system:

php-cli -l filename.php

You could omit the -l option (it’s a syntax check option only) to parse and run the code, if you like. Either way, it’s an easy way to check your code without uploading it and potentially breaking your site.

Rasmus is the Man

Rasmus Lerdorf, that is, the creator and godfather of PHP. He’s got an article on the Oracle Technology Network titled “Do You PHP?” that’s definitely worth a read. Here’s a sample:

What it all boils down to is that PHP was never meant to win any beauty contests. It wasn’t designed to introduce any new revolutionary programming paradigms. It was designed to solve a single problem: the Web problem. That problem can get quite ugly, and sometimes you need an ugly tool to solve your ugly problem. Although a pretty tool may, in fact, be able to solve the problem as well, chances are that an ugly PHP solution can be implemented much quicker and with many fewer resources. That generally sums up PHP’s stubborn function-over-form approach throughout the years….

Despite what the future may hold for PHP, one thing will remain constant. We will continue to fight the complexity to which so many people seem to be addicted. The most complex solution is rarely the right one. Our single-minded direct approach to solving the Web problem is what has set PHP apart from the start, and while other solutions around us seem to get bigger and more complex, we are striving to simplify and streamline PHP and its approach to solving the Web problem.

The guy just oozes common sense. Here’s another bit about PHP that he wrote on the PHP-DEV mailing list about two years ago, one of my favorites that just sums up beautifully the philosophy of PHP:

The golden rules of PHP are to keep the WTF(*) factor low and the POTFP(**) factor high.

(*) What The Fuck
(**) Piss Off The Fewest People

No two ways about it: he’s one of my heroes.

Advanced PHP Programming

The book Advanced PHP Programming is out, by George Schlossnagle. Looks like it might be pretty interesting; there’s certainly a scarcity of good PHP books that cover advanced topics—most of them are targeted at the beginner and the basics, and don’t have anything to offer me.

(Quick disclaimer: some of the Wrox books actually look like they might be decent, but I haven’t had my hands on a Wrox PHP book since the first couple they published.)

There was a time when I wanted to write a PHP book. It was going to be an advanced book, called “PHP Secrets” and cover all sorts of topics. I never really pursued it, though, largely because of a general disillusionment in the computer book industry: you spend a year or more writing a book on a subject, and by the time it gets published it’s obsolete.

Thinking about it now, though, maybe a better venue for such a thing would be online, like what Mark Pilgrim did with his Dive Into Python book. That might be kind of cool; a live work-in-progress that I could (theoretically) keep up-to-date. Hmmm.

CMS Ranting

Gadgetopia has a good rant on content management that I’m just getting around to posting about. (CMS’s Should Manage Content, Not Display It)

My solution was to write a function library to make raw database calls to get everything out in a nice, big, nested PHP array. I essentially built an API for the CMS to make pulling content easy, but I do all the HTML processing in PHP, abandoning completely the display side of this CMS. I still use it for administration, workflow, etc. (which it excels at), but when PHP is such a fantastic, mature language, why reinvent the wheel?

I really don’t have anything to add to this, other than that this is largely why I favor developing my own PHP software rather than using pre-built systems—I have absolute control over the way the software works and I don’t have to rely on clunky, awkward front-end architecture and programming that I disagree with. Give me the data, and let me decide what to do with it.