Search this site


Metadata

Articles

Projects

Presentations

xpathtool - powerful xpath queries on the commandline

What is xpathtool?

Short version: xpath query tool for xml and html.

Long version: swanky frontend to xsltproc which takes an xpath query and content and spits out the results.

Dependencies: xsltproc (comes with libxslt), xmllint (comes with libxml2).

Download

xpathtool-20071102.tar.gz

Usage

--ihtml
Set input format as html.
--otext
Output should be text. Implemented as <xsl:value-of select="." />
--oxml
Output should be xml.
--ohtml
Output should be html.
--indent (default) or --noindent
Set whether or not xml or html output should be depth-based indented.
--stripspace=XXX
Define elements who's content should be space-stripped. Implemented with <xsl:strips-ace>.
--pretty (default) or --nopretty
Pretty print xml and html output by filtering through 'xmllint --format'

Example: Technorati WTF RSS

% GET feeds.technorati.com/wtf | ./xpathtool.sh '//link' | tail -3
http://technorati.com/wtf/we-can-take-our-country-back/2007/05/16/ron-paul-is-standing-up-tot-the-establishment-1
http://technorati.com/wtf/giuliani-is-deluded/2007/05/16/delusional-and-out-of-touch-with-reality-1
http://technorati.com/wtf/macbook/2007/05/16/apples-rule-1

Example: Slashdot article links

Slashdot is worthless. The article writeups are worthless. The comments are worthless. The users are worthless.

Sometimes, the linked content is not. Let's pull out all the links in all the articles on the frontpage:
# slashdot articles are inside the following html element
% xbase="//div[@class='article']//div[@class='intro']/i"
% GET www.slashdot.org | ./xpathtool.sh --ihtml "$xbase//a/@href|$xbase//a/text()"  | paste -d" " - - 
http://www.foreignpolicy.com/story/cms.php?story_id=3807 the world's biggest digital dump
http://googleblog.blogspot.com/2007/05/google-apps-partner-edition.html turn over their entire email operation to Google
http://apcmag.com/6138/the_dark_side_of_google_apps_for_isps the dark side of Google's offer

3 responses to 'xpathtool - powerful xpath queries on the commandline'

Showing last 3 comments... (Click here to view all comments)

James Fryer wrote at Fri May 2 06:02:15 2008...
Is there any way to use this with namespaces? For example I look at an Atom document which contains:

  xmlns='http://www.w3.org/2005/Atom'

and I can't see any of the title elements, because they are not in the XSL default namespace.

This is probably me being dim because I'm learning XPATH/XSL, thanks in advance if you have an answer.

James Fryer wrote at Fri May 2 06:13:41 2008...
Is there any way to use this with namespaces? For example I look at an Atom document which contains:

  xmlns='http://www.w3.org/2005/Atom'

and I can't see any of the title elements, because they are not in the XSL default namespace.

This is probably me being dim because I'm learning XPATH/XSL, thanks in advance if you have an answer.

Andrew wrote at Wed Jun 4 09:32:29 2008...
Thanks for posting this tool.  I've built a script with it that automates some HTML report testing.


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment: