/ BLOG / Parsing XML from the command line

This evening I had the fun of finally getting around to repairing a number of bash scripts which I use to automate a number of tasks on my personal servers. One of these parses a RSS feed and then downloads content, much like bashpodder, although unfortunately the data isn’t encapsulated as nicely as you would expect in a regular podcast feed.

In the past I’ve solved this by using sed, and as quick and as easy as this is, it mings massively when you need to update it for whatever reason.

Rather than rewrite the entire script in another language I hit google. My first result was a fantastic tool, called XMLStarlet, that I’d not heard anything about. The blurb describes it as “a command line toolkit to query/edit/check/transform XML documents”, and quite frankly it does exactly that. Nothing more, nothing less. What it fails to make a big deal about is that it’s simple, and cross platform.

A quick example of using it to echo out the value of each title tag, from the RSS feed generated by example.com, would be as follows;

wget -q ‘http://example.com/rss2.xml’ -O - 2>/dev/null | xmlstarlet sel -t -m ‘/rss/channel/item’ -n -v ‘title’
mgrouch and arcanum - my hat goes off to you in thanks!