Chuggnutt!

Category: Online

Spam Pounder

So the spam problem finally got to be a little overwhelming on our BendCable email account, and we opted in to use BendCable’s anti-spam software/service, Spam Pounder. But here’s the catch: you don’t actually get this anti-spam service on your regular bendcable.com email, no—instead they change your email to a bendbroadband.com address because that’s where they have the actual anti-spam software running. (In order to preserve your bendcable.com address—which you may have had for years, as we have, and don’t want it gone—they set up a forward that shunts everything from your bendcable.com address to the bendbroadband.com one.)

I mean, what the hell is that? Sure changing your email address is a solution for spam, but that’s not the point. I don’t have a lot of confidence in an ISP that can’t even set up spam filtering software on their main mail server, fer chrissakes.

And what the hell is with that name (“Spam Pounder”) and logo?? The images I’m associating with it are not good ones…

Now, having said all that, I will concede that so far it’s doing the job: almost all of the spam is now being caught, I’d give it a 98-99% effectiveness rating so far. The technology seems to work.

But why can’t BendCable integrate this into their main email server like everyone else?

February 26, 2004
Timely Wired Issue

After all the hubbub over Google the last few days, I thought it was pretty interesting when my issue of Wired came today, with “Googlemania!” on the cover. Timely.

February 21, 2004
Is Google Broken?

Elsewhere on this site I’ve stated that I love Google. That still mostly holds true, but there’s been some things about Google lately that are making me pause a bit.

The first concerns Google’s apparent abandonment of RSS for (exclusively) the still-incubating Atom syndication format/API. I won’t bother rehashing the situation here; if you want more details, check out this wonderfully recursive-ironic Google search for “google atom” to get all the gory details. To me this seems like a highly questionable/irresponsible move for Google to make, frankly rather surprising. Hopefully they’ll come to their senses over there.

The other thing deals with their AdWords program. I think it’s broken. Here’s the deal: We’ve been toying with AdWords to run ads on a new project we’re working on, to see how the system worked and if it would be worth it to ramp it up. (Side note: very cool. You can get a nice in-depth look at Google’s internal keyword rankings without ever putting any money down.) Well, it worked for a while, we were very impressed, but then suddenly, over the weekend sometime (I think), it stopped working.

Completely. Our ad never shows up on the exact same searches that it was previously showing up under before. In fact—and here’s the biggest clue that something is seriously broken—as you page through the results, the exact same ads that appeared on the first page of results appears on every subsequent page of results.

WTF?

This did not happen before and should not be happening now. Something is broken. Period. For at least a week. Could it have something to do with Google doubling their index to over 6 billion items (4 billion web pages)? Maybe.

Ideas?

February 19, 2004
Amazon Reviews

One of the big online stories over the past couple of days is Amazon.com‘s weeklong glitch that “suddenly revealed the identities of thousands of people who had anonymously posted book reviews” (New York Times article here). Turns out a lot of what was revealed was that authors were anonymously writing glowing reviews of their own books, and getting family and friends to do so too—and conversely, anonymously panning rivals’ books. This “glitch” exposed a bigger issue:

…many people say Amazon’s pages have turned into what one writer called “a rhetorical war,” where friends and family members are regularly corralled to write glowing reviews and each negative one is scrutinized for the digital fingerprints of known enemies.

Amazon called this “an unfortunate error.” Yeah, right.

Consider: these “anonymous” reviewers are not anonymous at all, Amazon clearly tracks who they really are and can, at any given time, follow exactly who is saying what about any book. Confronted with the questionable antics of these reviewers and the growing “rhetorical war,” I know what I would do to try to put a stop to it. (Here’s a hint: it’s basically the same thing that happened to Amazon.)

February 15, 2004
PHP XML Benchmark

Interesting PHP benchmark of parsing XML showed up on PHP Everywhere. In High Speed XML Parsing is Not Intuitive, John Lim tested five methods of extracting the title element from an XML RSS feed. Surprising results; the regular expression match was by far the fastest, and I would have thought the SAX parsing (based on libxpat, I believe) would have scored significantly faster than the DOM or XPath parsing—but it came in last.

Of course, the regular expression matching in this case was a bit simplistic—typically if you’re going to parse XML files, you’re looking for more than one element. But it’s a good technique to keep in mind.

February 12, 2004
Oregon SWAP

From UtterlyBoring I picked up this link to Oregon SWAP, which looks like an interesting experiment.

SWAP is designed to promote reuse of materials in Central Oregon. It is a free and convenient way for individuals and businesses to exchange reusable or surplus products and prevent them from ending up in the dump.

Looks interesting, although the small Comic Sans font is making my eyes bleed. I also notice they seem to be running PHP for their database search. Booyah!

February 10, 2004
Data Mining the Web

An interesting article today on MSNBC titled “Online search engines lift cover of privacy“, and the “InfoPorn” section of February’s Wired (can’t find a link, sorry) highlighting identity theft motivated me to write about a topic I’ve been thinking about for a while now: data mining the Web.

The article talks about the absurd amount of information that is freely available on the Web, and how much of it is accessible through Google—and then calls using Google to find this data “Google hacking.” I think a more accurate term would be Google mining—there’s really no mad hacker v00d00 ski11z involved, and let’s face it, being able to run a realtime query against a massive database containing billions of pieces of information is really the essence of data mining.

What got me thinking about mining the Web? Most recently, social networking software, and the data such software collects from its users. As I’ve written before, what a useful social networking system will do (among other things) is allow you to crawl the relationships among people and be able to drill-down by varying degrees into their data/life/online platform. But you know, you can already essentially do this with nothing more than a Web browser; it all goes back to the fact that there is an absurd amount of information freely and publicly available on the Web—much of it cheerfully self-published by people who should know better.

Example? Resumes. You’ve all seen them; half the personal sites out there have an online resume page, and you can find at least 45,300 more by searching Google for “resume.doc”. On average, they contain a shocking amount of personal information: what schools you went to, and when; who employed you, and when; your address and phone number; your skills; sometimes your Social Security number. Tip of the iceberg.

You can find out a lot about someone simply by reading their blog. My own is no exception, I’m sure, but sometimes even I’m amazed about how much personal detail people will reveal online.

And did you know you can search for wishlists at Amazon.com and often a user’s wishlist will also contain their birthday and the city and state in which they live? If that doesn’t work, try finding someone’s birthday on Anybirthday.com—they boast having over 130 million entries gleaned from public records.

Here’s where it gets tricky. The MSNBC article takes an alarmist tone, and in part it’s right to do so: companies and people that leave sensitive documents published on a crawler-accessible Web page are in danger of having their privacy violated. However, a lot of the information that’s out there is already public information, or information that’s freely volunteered by people and becomes public. Google is merely a tool that aggregates this information into one source. And me? Hell, I love Google, I frankly think it’s amazing. And I’m an information junkie, I salivate over the data mining possibilities—and I’ve got ideas rolling around my head on what could be done with this data, ways it can be manipulated, and linked, and so on.

We’ve barely scratched the surface when it comes to mining the Web—I think the untapped possibilities we’re sitting on are enormous, potentially dwarfing anything we’ve previously encountered. Google is a first step.

What’s next?

February 9, 2004
Quick linking

Here’s something interesting: my blog entry on Bend WinterFest is now the number 4 result on Google when searching for “Bend winterfest“—less than two weeks after I posted it. Damn, that’s fast.

February 7, 2004
Social software again

All the hooplah over Orkut last week got me thinking more about this “social software” phenomenom from sites like Orkut and Friendster. You may remember I’ve ranted about Friendster before. My conclusions at the time were that I could see some value to it, but didn’t know what I could actually do with it.

Several months later, same results. What do I do with this type of software? I don’t need a date. I get bored with searching for people I don’t know when all I can do is search. They’re poor at facilitating communication compared to other technologies. I already have an address book—several, actually—of people that I do know and keep in touch with. So?

So, all of these social networking sites seem to me to be half-baked: they’re a framework built upon an interesting idea, but they’re not done yet. Honestly, I’m not even sure I can tell what the end goal is—having an interesting idea doesn’t guarantee success.

The interesting thing about Orkut is that it’s an invitation-only service—meaning, that every user is linked to every other user in one big network—unlike Friendster or the others where there are “pockets” of networks, existing independently. Having everyone linked in some way is inherently more valuable to me; stand-alone networks diminishes the value of the system.

But what system? Still a problem. I suppose it would be interesting to be able to crawl or browse the network of people—the big one, like Orkut does—and be able to drill-down into user data to varying degrees, based on the proximity in the network that user is to you. But there would have to be more than just user data; I’d want to drill-down into their online presence/identity/platform—the blogs, the photo galleries, the web pages and XML files of metadata, their trail of public interactions across the web (like on forums, or weblog comments)… As an example, a user browsing/crawling me would be able to drill-down into chuggnutt.com, which is becoming more and more the platform which defines my online existence. From here they could read my weblog and the archives, follow the links to any projects I’m working on (that I choose to share), see what sites and blogs I read, play with any apps I develop, etc.

(I realize as I write this I’m also envisioning some of the online experience David Brin wrote into his near-future novel, Earth. But I haven’t read it in a long time, so I may be way off.)

But, I can accomplish a lot of that now anyway, why another service for it? As far as I’m concerned, the real social software has been around for quite awhile now: BBSes, email, IRC, Usenet, instant messaging, weblogs. There’s more, but you get the idea.

February 5, 2004
Social networking backlash

The topic du jour this week in the weblogs I read seems to be backlash against social networking services, particularly Orkut, the new one from Google. Interesting, but it’s not like you couldn’t see it coming. I’ll have more to say on this soon.

January 30, 2004