Using XSLT to extract data from an OPUS XML download

Sorry folks, this is mostly just for me. Someone recently reminded me of XSLT and XPATH, as we are looking into using a CMS that makes extensive use of XSLT. This is not something that I’ve thought about in a long time. So, for fun, I downloaded some OPUS grade data in XML format and extracted a subset of the data using XSL. Here’s how I did it:

  1. Google for XSLT tutorials. Find one: http://www.xmlmaster.org/en/article/d01/c07/ and read through it. Combine that with this discussion about searching in XSLT.
  2. Get some XML data to play with. We use PeopleSoft here at Emory and the query viewer gives you an option to download the results in XML. savetoxml
  3. Now that you have this 106,000 line XML file, pick an arbitrary search term that you can use as an example to test data extraction. Decide what your HTML should look like.
  4. For this exercise, I’ve decided that I want to extract the class name, name of student, and grade earned for LAW 659J, which is “Doing Deals – Mergers and Acquisitions”.
  5. The end result will look something like the following screenshot:

grades

So, with that set of requirements out of the way, the XSL code to implement this is surprisingly straightforward. It turns out that the “top” element in the XML returned by PeopleSoft is “query”. Within the query, there are “row” elements and each “row” element contains various data elements.

The XSL to output the above table then looks like this:

<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:template match="/">
 <html>
 <body>
 <h1>Grades</h1>
 <table border="1" width="400">
 <tr><th>Class</th><th>Name</th><th>Grade</th></tr>
 <xsl:apply-templates select="query/row"/>
 </table>
 </body>
 </html>
 </xsl:template>
 <xsl:template match="row">
 <xsl:choose>
 <xsl:when test = "C.CATALOG_NBR = ' 659J'">
 <tr>
 <td><xsl:value-of select="C.DESCR" /></td>
 <td><xsl:value-of select="A.NAME" /></td>
 <td><xsl:value-of select="B.CRSE_GRADE_OFF" /></td>
 </tr>
 </xsl:when>
 </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

I’ll quickly run through what the XSLT does. It works on the “query” element initially – that’s the first xsl:template definition. Then for each “row” element, it applies the second template, the one that says “select=’query/row'”. That section of the template checks every row element to see whether C.CATALOG_NBR is equal to 659J, which is the course number. If that’s the case, then it builds an HTML row comprised of three table data cells, one each for the C.DESCR element, the A.NAME element, and the B.CRSE_GRADE_OFF element. Each constructed row is then inserted in turn into the the table.

How do you actually “run” this? You need to run the XML and XSL through an XSLT processor. Fortunately, the Mac has one built in. To run it terminal window on my Mac, I issue the following command:

xsltproc ./grades.xsl ./grades.xml >grades.html

This generates “grades.html”. A redacted version of grades.html is presented above. Anyway, as I said earlier, this is really for me, to serve as a reminder of how to do this sort of thing in XSL. As usual, thank you to the people on the internet who take the time to document these solutions for the rest of us!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s