photo
Jordan Sissel
geek

Tue, 20 Nov 2007

xpathtool - powerful xpath queries on the commandline

What is xpathtool?

Short version: xpath query tool for xml and html.

Long version: swanky frontend to xsltproc which takes an xpath query and content and spits out the results.

Dependencies: xsltproc (comes with libxslt), xmllint (comes with libxml2).

Download

xpathtool-20071102.tar.gz

Usage

--ihtml
Set input format as html.
--otext
Output should be text. Implemented as <xsl:value-of select="." />
--oxml
Output should be xml.
--ohtml
Output should be html.
--indent (default) or --noindent
Set whether or not xml or html output should be depth-based indented.
--stripspace=XXX
Define elements who's content should be space-stripped. Implemented with <xsl:strips-ace>.
--pretty (default) or --nopretty
Pretty print xml and html output by filtering through 'xmllint --format'

Example: Technorati WTF RSS

% GET feeds.technorati.com/wtf | ./xpathtool.sh '//link' | tail -3
http://technorati.com/wtf/we-can-take-our-country-back/2007/05/16/ron-paul-is-standing-up-tot-the-establishment-1
http://technorati.com/wtf/giuliani-is-deluded/2007/05/16/delusional-and-out-of-touch-with-reality-1
http://technorati.com/wtf/macbook/2007/05/16/apples-rule-1

Example: Slashdot article links

Slashdot is worthless. The article writeups are worthless. The comments are worthless. The users are worthless.

Sometimes, the linked content is not. Let's pull out all the links in all the articles on the frontpage:
# slashdot articles are inside the following html element
% xbase="//div[@class='article']//div[@class='intro']/i"
% GET www.slashdot.org | ./xpathtool.sh --ihtml "$xbase//a/@href|$xbase//a/text()"  | paste -d" " - - 
http://www.foreignpolicy.com/story/cms.php?story_id=3807 the world's biggest digital dump
http://googleblog.blogspot.com/2007/05/google-apps-partner-edition.html turn over their entire email operation to Google
http://apcmag.com/6138/the_dark_side_of_google_apps_for_isps the dark side of Google's offer

Comments: 3 (view comments)

Permalink: /projects/xpathtool/main
posted at: 00:11


3 responses to 'xpathtool - powerful xpath queries on the commandline'

James Fryer posted at Fri May 2 09:02:15 2008...
Is there any way to use this with namespaces? For example I look at an Atom document which contains:

  xmlns='http://www.w3.org/2005/Atom'

and I can't see any of the title elements, because they are not in the XSL default namespace.

This is probably me being dim because I'm learning XPATH/XSL, thanks in advance if you have an answer.

James Fryer posted at Fri May 2 09:13:41 2008...
Is there any way to use this with namespaces? For example I look at an Atom document which contains:

  xmlns='http://www.w3.org/2005/Atom'

and I can't see any of the title elements, because they are not in the XSL default namespace.

This is probably me being dim because I'm learning XPATH/XSL, thanks in advance if you have an answer.

Andrew posted at Wed Jun 4 12:32:29 2008...
Thanks for posting this tool.  I've built a script with it that automates some HTML report testing.


Leave a reply

You need javascript enabled to use this form. Anti-spam efforts ongoing. Also, if the comment doesn't show up, it's because the form expired. Go back and copy your comment, reload the form, and resubmit. Apologies if this is a hassle, I'm just playing with antispam methods right now. If this insists on not working, please email me about it.

Name (required)
E-mail (optional, if you want me to be able to email you back)
URL (also optional)
Comment:


Search this site

Navigation

Metadata

Home About Resume My Code (SVN)

Articles

ARP Security Dynamic DNS with DHCP OpenLDAP+Kerberos+SASL PPP over SSH SSH Security: /bin/false Week of Unix Tools Work Efficiency

Projects

fex firefox tabsearch firefox urledit grok keynav liboverride newpsm (FreeBSD) nis2ldap pam_captcha poor man's backup Solaris audio utility xboxproxy xdotool xmlpresenter xpathtool misc scripts

Presentations

Yahoo! Hack Day '06 Unix Essentials Vi/Vim Essentials

Tag Cloud

Calendar

< November 2007 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
252627282930 

Friends

BarCamp Kent Brewster Tantek Çelik John Resig Wesley Shields Tyler Shields

Technorati