Category: Computers

  • vCard

    I’ve been playing with the vCard format for a project at work and I gotta say, there’s a technology that’s begging to be re-implemented in XML. I mean, here’s the behind-the-scenes formatting of a vCard file:

    BEGIN:VCARD
    FN:Mr. John Q. Public, Esq.
    N:Public;John;Quinlan;Mr.;Esq.
    BDAY:1995-04-15
    ADR;DOM;HOME:P.O. Box 101;Suite 101;123 Main Street;Any Town;CA;91921-1234;
    TEL;PREF;WORK;MSG;FAX:+1-800-555-1234
    END:VCARD

    …with a bunch of arcane rules for delimiters and encoding. Uh, hello? EDI? 1989 called, and it wants its format back.

    Wouldn’t something like this XML mockup of the same thing just make more sense?

    <vCard>
      <name>
        <family>Public</family>
        <given>John</given>
        <additional>Quinlan</additional>
        <prefix>Mr.</prefix>
        <suffix>Esq.</suffix>
        <formatted>Mr. John Q. Public, Esq.</formatted>
      </name>
      <dob>1995-04-15</dob>
      <address>
        <type>Domestic, Home</type>
        <po>P.O. Box 101</po>
        <extended>Suite 101</extended>
        <street>123 Main Street</street>
        <locality>Any Town</locality>
        <region>CA</region>
        <postalCode>91921-1234</postalCode>
      </address>
      <telephone>
        <preferred />
        <type>Work, Message, Fax</type>
        <number>+1-800-555-1234</number>
      </telephone>
    </vCard>
  • Useless lists: Computer stuff

    A co-worker who’s moving was telling me today about finding a dusty box in his attic that turned out to be an original Atari 2600, and for some reason that made me want to blog about it. Instead, this turned into a list of all the various computer and video game systems I own that I’ve accumulated through the years—all in the spirit of blogging useless lists (like I did the other day with the books left on my bookshelf).

    It’s pretty geeky. And probably a little sad. Reading over the list, it highlights that I’m often behind the times when it comes to hardware. I’m retro-geeky. Read on if you dare.

    (more…)

  • overLIB

    Pointer to a totally excellent JavaScript library for creating popups: overLIB. I’ve been using it the last few days to put together a dynamic drop-down menu for a Web project at work. And I’ve used it before to create popup context menus and tooltips. It’s simply one of the best JavaScript tools out there that I’ve come across—it’s clever, simple to use, and it just works, period.

  • Imperfect end to an imperfect week

    I couldn’t even get myself to post yesterday, I was just done. This last week was the shit week for computer troubles. After spending the first half of the week struggling over my wife’s computer, and Thursday reformatting and reinstalling Windows on a coworker’s computer, Friday was the kicker.

    The hard drive in the boss’s computer at work died. Yeah, the Boss. I get to work Friday morning, find a note on my desk: “Computer says ‘Disk boot failure, insert system disk’ since last night.” Ohhhhhh, how I hoped the problem was simply that there was a disk in the floppy drive.

    There wasn’t.

    Nope. Machine won’t boot; hard disk clicks when it has power. That’s never a good sign. Can’t usefully boot to the floppy; the bootable floppy disk I have is for Windows 98 (yes, almost all of the computers in the office are still running Windows 98), and this is a newer eMachine running Windows XP, so the Win98 boot disk can’t recognize the NTFS partition. Contemplate for a moment running the restore CD, but that will wipe out all the data on the drive, and that can’t happen.

    Of course, like all good, responsible IT persons, I make sure any critical work and files in the office are on the network, right? Right. And the network data is backed up to tape every night, right? Right. So, there really should be no problem, right? Just restore Windows XP (though it’s a bad drive, remember, and really should be replaced), and all the data is safe, right? Well, almost.

    Friggin’ Microsoft Outlook stores all of its data—emails, contacts, events—in a single .PST file on the local machine, not on the network. Uh-oh. And for the Boss, email is the lifeblood of communication in the company; he’ll send out 40-plus emails in any given day. Double uh-oh.

    But no, wait, hold on: like all good, responsible IT persons, I have batch files running on individual workstations that back up the Outlook data files to the network daily, so that they’ll be backed up to the tape each night. This was instituted months ago, after the CFO of the company suffered a major email loss and we identified Outlook as a Major Point of Weakness in the company’s data integrity.

    Whew! Run to the network, open up the appropriate user folder where the Outlook data file should be, check the timestamp on the file.

    Time freezes.

    Somewhere nearby, a cat meows in slow motion. A trillion water molecules in the Deschutes River ricochet off one another in a brilliant cacophany of sound not unlike that of billiard balls on the break. Deep in my brain, a synapse fires and a single drop of sweat languidly rolls down my spine.

    January 30, 2004.

    Not April 1, 2004. January 30. I have never in my life wished more for something to be an April Fool’s Day prank.

    So what happened to my carefully crafted plan of a batch file running at a scheduled time each night?

    The Boss shuts down his computer each night before it can run.

    And that, of course, is the punchline. The rest of my day at work—literally, all but about an hour of it—was spent trying in vain to access the hard drive, just to pull the email from it. No love. A computer place in town that does data recovery was able to see the drive, sort of, but were unable to pull anything from it. The only option left is to shell out up to two grand and have a professional data recovery outfit like Ontrack retrieve the email. I don’t know if we’ll go that route, though.

    By the end of the day, I felt I was about to stroke out. Visions of myself convulsing on the floor seemed oddly appealing. The saving grace of it all is that it was Friday, and the kids were being watched so my wife and I were able to go out to dinner and a movie. We saw “Secret Window,” which was pretty good.

    I’m hoping next week will be better.

  • Some nights I just hate computers…

    God damn the computers are pissing me off tonight. All evening our broadband cable connection has just been running slower than molasses, so it takes forever to accomplish anything online. And then I’m trying to get my wife’s computer fixed up, it’s been running really slow lately and locking up a lot. So I rolled back the Windows ME that was installed on it (have I mentioned before how I hate Windows ME??) to Windows 98, which by and large worked well enough, but now can’t get the blasted TCP/IP to work properly.

    It tells me it’s assigned to some 169.* address, and the DHCP server is “255.255.255.255” (yeah, sure), instead of being sensible and using the perfectly acceptable DHCP server and IP address assignment that has worked with every other computer we’ve had in this house. And the worst part is, I’m sure I’ve encountered this same problem at work, and solved it, but I can’t remember what the solution was. I’ve already tried uninstalling and re-installing TCP/IP, so I don’t know. Maybe it’s just time for the straight low-level format route. Son of a bitch.

  • Search Patch

    While waiting to find out if my hosting provider will change the minimum fulltext word length for MySQL, here’s what I’ve done in the meantime to deal with viable three-character search terms.

    First, I split the search string into the component words (an array). I subtract any stopwords (I’ve got a big list) and for any remaining words that are under four characters long, I add to the SQL query I’m running.

    Here’s the basic form of the query that I’m running, say searching for “porter”:

    SELECT *,
    MATCH(body) AGAINST('porter') AS relevance
    FROM content
    WHERE MATCH(body) AGAINST('porter')
    AND [additional conditions]
    ORDER BY relevance DESC
    LIMIT 10

    This uses fulltext indexing to search for “porter” with weighted relevance, and returns the appropriate content and its relevance score. Pretty straightforward, and it works really well.

    Here’s what the modified query looks like, if there’s short words present, for the search “porter php”:

    SELECT *,
    MATCH(body) AGAINST('porter') +
      (1 / INSTR(body, 'php') + 1 / 2[position of word in string])
    AS relevance
    FROM content
    WHERE ( MATCH(body) AGAINST('porter')
      OR body REGEXP '[^a-zA-Z]php[^a-zA-Z]'
      )
    AND [additional conditions]
    ORDER BY relevance DESC
    LIMIT 10

    Two new things are happening. First, in the WHERE clause, I’m using both the fulltext system to find “porter” and using a regular expression search for “php.” Why REGEXP and not LIKE? Because if I write LIKE '%cow%' for instance, I’ll not only get “cow” but also “coworker” and other wrong matches. A regular expression lets me filter those scenarios out.

    That takes care of finding the words, but I also wanted to tie them into relevance, somehow. The solution I hit upon in the above SQL is relatively simple, and does the trick well enough for my tastes. Basically, the sooner the word appears in the content, the higher its relevance, which is reflected in the inverse of the number of characters “deep” in the content it appears. And I wanted to fudge the number a bit more by weighting the position of the keyword in the search string; the sooner the keyword appears, the higher the relative score it gets.

    It’s not perfect, and I definitely wouldn’t recommend using this method on a sufficiently large dataset, but for my short-term needs it works just fine. The only thing really missing in the relevance factoring is how many times the keyword appeared in the content, but I can live without that for now.

  • Searching and Minimum Word Length

    Mike Boone, in the comments section of yesterday’s entry on searching (“Updated Search“), correctly points out that searching my site for a word that is less than four characters in length (like “php” or “cow”) does not work—no results are returned. Obviously, since I write about PHP on occasion, this is untenable.

    The problem is that MySQL‘s fulltext indexing, by default, only indexes words greater than three characters long, and I don’t think I have any way to change this, despite my initial reply to Mike’s comment. This site is running on a shared server setup on pair.com, and I have absolutely zero control over the MySQL server configuration. I might post a question to their tech support, but I’m not overly optimistic about the response. So, what to do?

    Short term, here’s my solution (though it’s not implemented yet): examine each word in the search string, throwing out stopwords (like “the,” “and,” “so,” etc.), and for any word shorter than four characters long, do a LIKE search against the content for them. No, it’s not ideal, but it’s a patch. Comments?

  • Updated Search

    I’ve been vastly updating the search functionality on my site. I’m still using MySQL‘s built-in FULLTEXT indexing to perform searches, but I’ve made the results page look a lot more (okay, almost exactly like) Google‘s. The main differences are that I’m not paginating search results (yet)—all searches limit to 10 results—and that I’m showing a relevance percentage, the first result being arbitrarily determined to be a 100% relevant.

    To determine relevance, I’m relying on MySQL: a fulltext MATCH(field) AGAINST('search string') directive will return the relevance number that MySQL computes when used in the SELECT part of a query. (See MySQL Full-text Search in the online manual for detailed info on this.)

    Further plans for searching that I haven’t implemented yet: utilizing MySQL’s IN BOOLEAN MODE parameter with searching to allow advanced things like phrase searches (with quotes), required word matching (using the plus sign), and subexpressions using parentheses. It’s pretty cool stuff. Oh, and I want to be smarter about presenting excerpts: Google tries to show you content excerpts with your search terms in them, I want to be able to do the same; currently I’m just showing the first 250 or so characters of the text with HTML stripped out of it.

    And since I’m developing my whole Personal Publishing System in an open process, I’ll write up a detailed technical article soon on how to effectively use MySQL fulltext searching and show Google-like results. All real-world; the code will be cribbed right out of my search.php file.

  • PHP Development Hint

    Here’s a general hint for PHP development: A quick and easy way to check for syntax or compile errors without uploading the PHP script to the Web server and testing online through a browser is via the command line. It’s obvious, and I don’t know why I didn’t think of this sooner, but I’ve been doing more and more of it lately.

    I develop primarily under Windows (with PHP installed) and upload to a Unix-variant server, and this what I’ve been doing to run a PHP script on the command line on my Windows system:

    php-cli -l filename.php

    You could omit the -l option (it’s a syntax check option only) to parse and run the code, if you like. Either way, it’s an easy way to check your code without uploading it and potentially breaking your site.

  • Computer Languages History Timeline

    From the Computer Languages History site comes an impressive computer languages timeline chart. It’s as much a language family tree as it is a timeline. Very nice, though a little hard to read.