XSLTransforming Society

I previously wrote about how XPath can ease the pains of working with large XML files. The example used there revolved around a storage/retrieval type of situation, and thus XPath handled the job nicely. However, what happens when you are needing to grab multiple parts from different sections of a large XML file? XSLT is your answer.

If you’ve ever worked with Twitter’s API, then you are aware of this kind of dilemma. Twitter has a massive amount of data available to it, and the default API call to retrieve tweets from your friends list spares none of that information. Here is the XML of one single tweet:


<user>
  <id>27682876</id>
  <name>Scott Huff</name>
  <screen_name>ScottBHuff</screen_name>
  <location>Beverly Hills</location>
  <description>Television, and radio host.</description>
  <profile_image_url>http://a3.twimg.com/profile_images/135381869/967539494_l_normal.jpg</profile_image_url>
  <url>http://www.huffandstapes.com</url>
  <protected>false</protected>
  <followerscount>1795</followers_count>
  <profile_background_color>9ae4e8</profile_background_color>
  <profile_text_color>000000</profile_text_color>
  <profile_link_color>0000ff</profile_link_color>
  <profile_sidebar_fill_color>e0ff92</profile_sidebar_fill_color>
  <profile_sidebar_border_color>87bc44</profile_sidebar_border_color>
  <friends_count>57</friends_count>
  <created_at>Mon Mar 30 17:13:49 +0000 2009</created_at>
  <favourites_count>2</favourites_count>
  <utc_offset>-28800</utc_offset>
  <time_zone>Pacific Time (US & Canada)</time_zone>
  <profile_background_image_url>http://s.twimg.com/a/1274739546/images/themes/theme1/bg.png</profile_background_image_url>
  <profile_background_tile>false</profile_background_tile>
  <notifications>false</notifications>
  <geo_enabled>false</geo_enabled>
  <verified>false</verified>
  <following>true</following>
  <statuses_count>1180</statuses_count>
  <lang>en</lang>
  <contributors_enabled>false</contributors_enabled>
  <status>
    <created_at>Wed Jun 02 17:04:16 +0000 2010</created_at>
    <id>15264717208</id>
    <text>RT @Jacki_Bray After waiting for 45 minutes... http://yfrog.com/0vzlnej luckily I have a book./// I hope for your sake it's The Stand</text>
    <source>web</source>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo/>
    <coordinates/>
  <place/>
    <contributors/>
  </status>

As you can see, we get a lot of information for (at max) a 140 character message. There are certainly circumstances where part or all of this information would be necessary, but in most applications we’re not going to want all this information. Wouldn’t it be nice to take this data in and then only spit out the information we want? Let’s see if we can get our Twitter API response XML to only show the screen name, actual tweet, and date/time of tweet.

First, a little house cleaning. To setup our XSLT file, we’ll create a twitter.xsl file below:

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/user">

</xsl:template>

</xsl:stylesheet>

We are using a little XPath here, as the xsl:template element matches to the XPath query “/user”. This will give us the user element in the root of the XML document. Next, we’ll need to have our source XML file point to our newly created XSL file via the following (placed directly below the xml declaration on the top line):


<?xml-stylesheet type="text/xsl" href="twitter.xsl"?>

Now that the two are connected, we can add a few lines inside our <xsl:template>:


<screen_name>

<xsl:value-of select="screen_name"/></screen_name>

<time>

<xsl:value-of select="status/created_at"/></time>

<tweet>

<xsl:value-of select="status/text"/></tweet>

Again, using XPath queries, we get the screen_name element, which is directly under the user element we matched at the start of the template, and the created_at and text elements, which are under the status element. We place these inside our own XML element tags, and when we view our source XML file only the information we requested will be displayed:

<?xml version="1.0"?>

<screen_name>ScottBHuff</screen_name>
<time>Wed Jun 02 17:04:16 +0000 2010</time>
<tweet>RT @Jacki_Bray After waiting for 45 minutes... http://yfrog.com/0vzlnej luckily I have a book./// I hope for your sake it's The Stand</tweet>

This file has only the pertinent information we need, and obviously will be much easier to work with.

Michael Marr
About Michael Marr
Michael Marr is a staff writer for WebProNews

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>