Anatomy of a blog hack

So, last weekend I found out that my blogs had been hacked.

Actually, it wasn’t just my blogs, nothing personal involved or anything like that: the shared server space my sites were hosted on was compromised, and a good number of other sites and files were hacked as well. Based on what I can piece together, here’s what happened:

There were a number of sites on this hosting space that were running out-of-date versions of WordPress, and some that also had various other PHP code installed (NetOffice, Gallery 2, a few others). Any software that is outdated is potentially at risk to known exploits, but more worryingly, I found an old bit of PHP code on the server that was set up to run arbitrary PHP code for (I presume) some back-end admin processing, and ultimately I think this was what had been exploited.

And until I had found and killed this code, the exploit happened at least 3 times even as I was cleaning up the server.

The exploit itself, once I knew what to look for, was fairly simple:

  • In PHP files that were writable to the Apache webserver process, the code was altered so that any line containing an opening PHP tag (which tells the server to start executing the code after it as dynamic PHP until the closing tag is reached) looked something like this:
    From <?php .....
    To: <?php     eval(base64_decode('malicious code encoded here')); .........
  • When I copied this code to a sandboxed PHP environment and decoded it, it contained fairly simple instructions:
    • If the visitor to the site was coming from a Referrer—in other words, if they had clicked on a link from another site like Google search results, Facebook, someone else’s blog—they were redirected instead to a completely different site that presumably contained spam, or malware, or whatever.
    • If the visitor was coming to the site directly—they had typed the URL directly into the browser’s Location bar, or clicked on a bookmark—then they were passed on through to the site.
Because I normally type in URLs to my blogs directly, or click the “recently visited” link in Chrome’s list, I didn’t see the exploit at first. But as I was writing a blog post on The Brew Site on Friday the 20th, I was searching out a link to a previous blog post (gotta love Google for that) and when I clicked that link to pull up the earlier post, I was redirected to some site in Poland (or at least, with a Polish country code for the top-level domain).

Fortunately, I don’t believe this hack was in place for long, since I often search out links in this manner and would have noticed sooner: Sometime in the wee hours of the morning of January 19th was when the files were first modified is the earliest I can determine.

It took me a bit of time to figure out the exploit (at first I was thinking it was the Google 302 hijacking exploit), but once I did I was cleaning up files on my blogs by Saturday morning. I hadn’t yet had the chance to address the (many) other files and old sites on the server hosting space, so unfortunately my blogs got re-infected at least once more before I was able to kill the old files and update others. Most of my weekend (and part of the following week) was spent updating, fixing permissions, cleaning, and deleting files and sites.

For reference, a handy pattern for detecting this code in grep is:

grep -R -l 'eval(base64_decode(' *

(This should always work because you should never have similar PHP running in your legitimate code…)

Now, I keep my WordPress blog software (and installed plugins) up-to-date pretty religiously, and I try to keep permissions set appropriately. But a good number of files in each blog were infected even so—how? It turns out, even though a fair number of the core files that were originally installed (manually) had the correct Unix group (“<account>:users”) and permissions of 644 (rw- r– r–) and were untouched, I was also making liberal use of WordPress’s built-in auto-updating feature, along with automatic plugin installation, and at some point the files that WordPress were updating got set to the “nobody:users” group—the Apache webserver process. It was these files that were exploitable to the “nobody” Apache process that was being exploited by the other code on the server. (Along with the few files I had set to group-writeable as well.)

So, lesson learned. I’ve battened down the hatches, fixed the permissions on all the files in my sites, and have decided to forgo WordPress’s auto-installing and update features for now for good measure. And, I’ve finished up a (long overdue) move of my blogs to a new webhost with none of the legacy code possibilities that were extant on the original server. (Nothing against the original web hosting provider, I just needed a clean break with an affordable price.)

Of course, you all let me know if you still run into any problems, okay?

Twitter cleaning

I figure I need to clean up my @chuggnutt Twitter account (and probably the @hackbend and @brewsite ones as well).

Not that I have an extraordinary number of followers, or people I’m following—522 and 425, respectively—but I realized there’s a fair amount of “noise” on what amounts to my personal Twitter account and there are accounts I’m also following on either @hackbend or @brewsite, and I don’t really need to see redundant tweets.

So I’ll be going through my personal Twitter account and weeding out accounts I’m following, and figure if anyone’s using something like who.unfollowed.me and gets offended that I unfollow them, I can at least point to my criteria:

  • If the account hasn’t had an update in 2 months or more, unfollowed.
  • If I’m also already following that account on @hackbend or @brewsite, I’ll unfollow on @chuggnutt.
  • Unless it’s someone I know personally, or have interacted with on @chuggnutt more often, then I’ll keep the (redundant) follow.
  • Of course there are accounts I just find interesting even if I never interact with them, so I’ll keep following those.
  • If the account seems spammy, or keeps posting repetitive tweets, unfollowed.
  • If the account is something like a brewery that I’m not already following on @brewsite—or a Bend business or similar I’m not already following on @hackbend—I’ll follow on those respective accounts and unfollow on @chuggnutt.

I’m not too worried about the followers to my account; it’s been awhile since I’ve had to do a bot/porn sweep and block accounts, and I haven’t really seen any I’d consider blockable come through lately.

…I should probably go through and clean up my Facebook sometime, too.

Pandora

The last several weeks I’ve been checking out Pandora, the “Internet Radio” site that lets you build custom stations of music based on your personal preferences (and provides a live stream of said music). You can give it artists or genres to choose from, and from there—and based on what you tell it you like and dislike in real time, as the music plays—it figures out other music to play for you.

So far it’s remarkably good. It’s like magic.

(Yes, I am well aware that by writing about Pandora now, in 2011, I’ve missed out on something like four or five years of its existence. One might say I missed the boat, and am now late to the show. I’m all right with that.)

Now, I’m not a big music guy—most of the time I listen to whatever’s on the radio in the car while driving to or from work, and I’ll play the occasional CD (I do own a few). I like music, it’s just more of a background to my life, and I don’t invest a lot of time into it. But with Pandora, it tweaks just the right buttons—I’m as interested in the algorithm behind what it will pick for me next as in the music itself. So I’ve been letting it play in the background at work and generally marveling at it.

I’ve only created one station thus far, but since it lets you create different stations I’m fascinated by the potential for creating other, vastly different ones based on mood (for instance).

It’s kind of cliché to say, but this is one of those internet technologies that just works, works well, and makes me feel like I’m living in the future.

Tools of the trade

It’s been awhile since I’ve posted anything overly technical here, but it strikes me that a “snapshot” of what I do (for work) and how I do it (the tech) might be useful to some.

What I do is web development for Smart Solutions here in Bend. Smart Solutions is a web and software development company and the company essentially has three main divisions: custom software development, SEO (search engine optimization—I know, that’s another post), and web development. All these “divisions” work pretty closely with each other—there’s a lot of line-blurring, actually—but for the most part developing websites for clients is what I do.

The platform we develop for is Pixelsilk—the custom Content Management System (CMS) that Smart Solutions developed from the ground up (and is still developing). The marketing pitch is, it’s SEO-optimized, gives you full control of your HTML, gives you all the tools you need to interact with social media, etc. etc. etc. Move past all that and get to the meat of it, and the primary things I really like about Pixelsilk is that you interact with all of your content and data inside of the system (rather that working with offline files that need to be FTP’d to various places), there’s a powerful and comprehensive Javascript API (giving me the capability to extend the system in new ways), it gives you the ability to re-use code and libraries, and that it’s entirely web-based—-meaning I can work on a site from any browser.

I’m also the company’s defacto WordPress developer—yes, we host WordPress blogs in addition to Pixelsilk sites—and a few other PHP applications so I still get a chance to flex my PHP muscles every now and again. (Smart Solutions is otherwise a Microsoft and .NET shop.)

Of course, I use a number of additional tools to develop for the web, and that’s what this post is really about.

What I use is a mishmash of online and offline tools. In the “offline” category I make use of:

  • The GIMP, open-source graphics software. Free to download, and fairly powerful, there’s still a lot I’m learning about it, but I do most of the graphics work I need to accomplish with it. (Photoshop is the standard for the company, but I’m not versed in it.)
  • Microsoft Visual Studio, various flavors. Sometimes moving the HTML/Javascript/CSS into an editing tool is easier to deal with, and I frankly like the Visual Studio editing environment best of the various programs I’ve tried for these types of files.
  • PHP Designer. I actually use the (older) free version because, well, it’s free and does what I need, it’s fairly lightweight, and it has the same kind of keyboard mappings and editing environment as Visual Studio.
  • Notepad. Yes, a stripped-down plain text editor. You’d be amazed at how much I have this open.
  • FileZilla. Yes, sometimes you still need an FTP client, and FileZilla is a good free Windows client.
  • PuTTY. A great free SSH client, because I spend a non-insignificant amount of time on a *nix command line.
  • Apache/PHP/MySQL: Installed on my Windows boxen as test environments. Pretty critical especially when developing WordPress themes.

Online:

  • Google Chrome and Mozilla Firefox as my primary browsers. I actually use Chrome as my primary while at work and Firefox while at home; these are both highly standards-compliant web browsers and I know if I can get something to work properly in them, then that is in fact how it should work. Chrome has some great built-in development and inspection tools; in Firefox I employ a number of extensions.
  • Web Developer (Firefox plugin): A variety of pretty essential additions in toolbar format for all aspects of web development.
  • Firebug (Firefox plugin): Probably the #1 plugin I would recommend; it adds code inspection, network information, Javascript debugging and inspection, and all manner of incredibly useful tools—you can’t be a proper web developer without this installed. (Chrome’s built-in tools come pretty close to this.)
  • Page Speed (Firebug add-on): A fantastic add-on to Firebug that analyzes the overall page performance (using Google’s recommended benchmarks/tests) and gives you hints on what you can improve.
  • Header Spy (Firefox plugin): Shows HTTP headers on the status bar, useful for troubleshooting server information.
  • AFOM (Firefox plugin): Incredibly useful plugin for the Windows version of Firefox which fixes the memory leak prone to Windows Firefox.
  • Internet Explorer: Of course, you can’t develop for the web without checking your work in IE, and IE8 has a decent set of developer tools built-in—including the ability to switch between IE7, IE8, and Quirks modes.
  • W3C Validator: Because you want to make sure your site code validates and works properly, right?
  • jQuery: The best Javascript library out there. If I’m doing anything in Javascript these days, 99% of the time it’s using jQuery.

There is of course other tools I use that fall primarily under the heading of “my own sites” and are not necessarily web development per se: Google Analytics and Google AdSense are two examples. That’s probably another post.

This list is likely incomplete—I may have missed an item or two or three, and if I think of any I’ll update it. But this gives an idea of the various tools I’m employing currently and to a large extent what I’d consider the minimum number any good web developer should be using these days.

Blog bot roundup

The variety is amazing: here’s a list of various agents, spiders and bots that I’ve culled from my chuggnutt.com logfiles over the last 30 days that have to do with RSS and/or blogs (specifically blogs, not just general purpose spiders like Google’s). These are only the ones I know for sure are blog or RSS related; others in my logs might be also, but aren’t obvious about it.

Geek types, note that these strings (with wildcards mostly) can be used as-is when identifying HTTP_USER_AGENT.

  • Bloglines: The web-based feed reader/aggregator
  • kinjabot: The (currently) beta bot for the Kinja weblog directory/guide
  • Feedreader: Windows-based feed reader/aggregator
  • PubSub.com RSS reader: Another searchable, web-based aggregator
  • FeedDemon: Windows-based feed reader/aggregator
  • fastbuzz.com: Fastbuzz News is another web-based aggregator that scans news and blogs
  • ORblogs.com-bot and ORblogs-bot: The crawlers for ORBlogs which compile metadata and RSS for the aggregating site
  • SharpReader: Windows-based feed reader/aggregator
  • Technoratibot: Technorati‘s crawler
  • UniversalFeedParser: Mark Pilgrim‘s liberal feed parser which is used in a variety of RSS software
  • Feedster Crawler: Feedster’s RSS spider
  • BlogBot: I think this is Blogdex‘s crawler, but I’m not totally sure
  • BlogPulse: Yet another blog/RSS crawler and indexer
  • Slower, Friendlier Spiders (BlogShares V1.35): The spider for BlogShares, the fantasy share market for blogs
  • NITLE Blog Spider: The National Institute for Technology and Liberal Education‘s spider for their blog census
  • LocalfeedsPageCrawler
  • NusEyeFeedCrawler

Al Fasoldt is at it again

Al Fasoldt is at it again, this time taking on Wikipedia. Remember him? Last year I blasted him for spreading FUD about web technology (“FUD Alert“), and then apologized this year for being so harsh (“Apology“). Well, now more people have caught on: tonight I read from this article on Boing Boing and this article on Joi Ito that Fasoldt has slammed Wikipedia and then taken the low road when someone called him on it: this article from Techdirt has the skinny:

Rather than take me up on the experiment, or suggest an alternative, he complained simply that the whole idea of Wikipedia was “outrageous,” “repugnant” and finally (in another email) “dangerous,” and therefore he refused to take part in my experiment. He told me that asking him to take part of an experiment that would show how Wikipedia corrected errors “wouldn’t change the danger” of Wikipedia — and mentioned how important it was that teachers everywhere knew what a dangerous tool this was. After this email exchange, he came to Techdirt himself, and commented that, based on what he read here, he was disappointed in our educational system — and proceeded to misquote a poem.

 

…by refusing to back up his claims, by mis-stating or ignoring nearly everything I said to him and by resorting to misdirection in his arguments, personally, I find Mr. Fasoldt to be untrustworthy — but I suggest you make your own judgment call on that one.

Now, I’ll be fair, I read Fasoldt’s original article that kicked this off, and I didn’t find it problematic. A little FUD-ish, but hey, that’s what he does. It could’ve stayed civil and turned into a good future article for him. But all this followup?

Well, I’m just sayin’.

Blue Oregon?

I keep seeing references to a new Oregon-related group blog called “BlueOregon,” purporting to reside at the domain name www.blueoregon.com. However, every time I try this domain, I get a “Future home of a domain” page—i.e., the domain name has been registered, but it’s parked on a generic landing page. (Even ORBlogs is showing content from it.) Is this a joke? Really bad DNS/proxy/caching/something configuration on BendBroadband’s part? What’s the deal?

Farking Irritating

Going through the chuggnutt.com logfiles for the 6th, I noticed that there were suddenly a bunch of hits to the Oobi image I’d posted here a while back from TotalFark. Basically, someone’s linked directly to the image on this server from a high-traffic site.

Now on the one hand, that’s kind of cool—but on the other hand, I’m a little irritated because TotalFark is a paid subscription site that I can’t access without registering first, which means I can’t just go and see what they’re doing with the Oobi image they’re pulling from me. Does that seem fair? Their site is saving money by sucking an image down over my bandwidth, and on top of that I’d have to pay them additional money to find out why.

And before someone points out to me that it’s only like 5 bucks to register and I’m therefore a cheap bastard, well, consider this: FARK‘s Terms of Service at the bottom of every page reads:

Text comments, audioedit submissions, and photoshopped images posted on Fark by registered users may not be reposted or broadcast without the express written permission or license from Fark.com and must attribute Fark.com as the source.

So if they won’t let people use their images without their permission, then why should I? It’s the principal of the thing.

Grumble… It might be time to brush up on some Apache rewrite rules…