Screw XHTML! Use RSS and XSLT instead

2005-01-16

Okay, that headline was just to get your attention. I'm sure this is old news to many but I recently decided to learn a little about XSLT. The XSLT language is a standard for converting XML (a standard format for storing data) into pretty much any other format.

An XML file is a file meant for a computer program to read to extract data. They are semi human readable. Here's some sample XML from a my website RSS feed

<?xmlversion="1.0"encoding="UTF-8"?>
<rssversion="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <channel>
    <title>Greggman.com</title>
    <link>http://greggman.com/</link>
    <description>Games, Gadgets, Gregg and stuff about Japan</description>
    <dc:language>en-us</dc:language>
    <dc:date>2005-00-15T17:00:03+09:00</dc:date>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
    <item>
      <title>Indie Games have Arrived</title>
      <link>http://greggman.com/edit/editheadlines/2005-01-04.htm</link>
      <description>Game Tunnel picked their indie games of the year and
            I gotta say, I'm pretty impressed.<span class="xmlpunct"></description>
      <guid isPermaLink="false">
            http://greggman.com/edit/editheadlines/2005-01-04.htm
      </guid>
      <dc:subject>games</dc:subject>
      <dc:date>2005-01-04T19:00:00+09:00</dc:date>
    </item>
  </channel>
</rss>

If you try to view that in your browser you'll see pretty much the same thing.

You can see things like title, date, subject, etc... Because of those keywords separating out the data other programs, like RSS readers, can parse and extract that data.  Without specific tags like that as far as another program is concerned it would all be gibberish.

Unfortunately to a human it practically is gibberish but, add just one line like this

<?xmlversion="1.0"encoding="UTF-8"?>

<?xml-stylesheettype="text/xsl"href="example.xsl"?>

<rssversion="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <channel>
    <title>Greggman.com</title>
    <link>http://greggman.com/</link>
    <description>Games, Gadgets, Gregg and stuff about Japan</description>
    <dc:language>en-us</dc:language>
    <dc:date>2005-00-15T17:00:03+09:00</dc:date>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
    <item>
      <title>Indie Games have Arrived</title>
      <link>http://greggman.com/edit/editheadlines/2005-01-04.htm</link>
      <description>Game Tunnel picked their indie games of the year and
            I gotta say, I'm pretty impressed.<span class="xmlpunct"></description>
      <guid isPermaLink="false">
            http://greggman.com/edit/editheadlines/2005-01-04.htm
      </guid>
      <dc:subject>games</dc:subject>
      <dc:date>2005-01-04T19:00:00+09:00</dc:date>
    </item>
  </channel>
</rss>

And now look at.  You can follow the link but it should look something like this.

You can try it with your own RSS feed. Copy these 2 files to your computer (example.xsl) and (example.css) (right click and pick "save as").  Copy any RSS 2.0 feed to the same folder.  For example one of these feeds (arstechnica, joelonsoftware, wired, cnn).  Edit the feed and add this line just below the first <?xml> line

<?xml-stylesheet type="text/xsl" href="/headlines/2005/example.xsl"?>

If there is no <?xml> line at the top then put that line first. Now open the feed file in your browser.  You can probably just double click it.

You don't need a separate RSS feed.  Your front page would BE your RSS feed.

Note that the XSLT has to be written for a specific version of RSS so you'd need a different XSLT file for RSS 0.91 or RSS 1.0 (RDF) or Atom etc.

There's all this talk of the semantic web coming sometime in the future but basically we can do it today!

Anybody running a standard blog it would take very little work to change your pages to spit out XML (for example RDF) and add the line at the top that makes your page formatted exactly like you already have it.  The advantage is that now other programs could read your page since the page would actually be XML and all separated out telling the other programs what part is the content, what the subject and title is, who wrote it etc.

The only legit problem is that some older browsers don't handle this but I think for most blogs that's not an issue.  Most people are running browsers that handle this.

There are a couple more issues that I'm sure are just a lack of knowledge on my part. One is that if you look at that page in Firefox it won't have a green border.  As far as I can tell that's a bug in Firefox.  The other is that I couldn't get XSLT to work with my RSS 1.0 feed.  I know that's possible, I just need to dig a little harder and I got tired trying to figure it out.  Maybe tomorrow 😊

Comments
Playlists
Katamari Fan Music