Selecting a deeply nested link using xpath query
<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
开发者_开发技巧 <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.
From this:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
I would like to extract this:
/wow/en/character/some-server/sometoon/achievement#92
which comes from the last anchor in the posted markup.
I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
Thank you in advance for your assistance and advice
*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href
There are a few problems with this XPath expression:
It is looking for a
ul
element that is a crandchild of the current node, and that has an attribute namedclass
whose string value is equal to the string value of one of the children-elements oful
, namedprofile-sidebar-menu
. However, theul
has no children namedprofile-sidebar-menu
and the whole expression doesn't select any node.Another problem is the indexing.
li[3]
selects the thirdli
element - child of the context node. However the wanteda
element is a child of the fourthli
child of the context node. This must be expressed as:li[4]
. XPath positions are 1-based, not 0-based.
If these two problems are corrected, I believe that the corrected expression should look like the following:
*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href
The absolute XPath expression that selects the wanted href
attribute starting from the top element body
of the provided XML document, is:
/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href
Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:
<body class="en-us">
<div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level">
<strong>85</strong>
</span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li>
<a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np">
<span class="arrow">
<span class=
"icon">Character Summary</span></span>
</a>
</li>
<li class="root-menu">
<a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np">
<span class=
"arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class=" active">
<a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np">
<span class="arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class="">
<a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np">
<span class="arrow">
<span class=
"icon">General</span></span>
</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
One can check that the above absolute XPath expression selects exactly the wanted href
attribute, by evaluating it with a tool like the Xpath Visualizer.
Here is a snapshot of the selection, performed with the XPath Visualizer:
If your DOM structure is consistent, then something like the following should work:
//ul[@class='profile-sidebar-menu']/li[last()]/a/@href
Your xpath statement makes no sense. You have multiple ul's in the path but the sample is not structured that way. Also, indexing in xpath starts at 1, not 0.
On the base of the html you show above (and assuming that final tags are correctly closed) the ewh'expression should work fine.
May be you omitted some important part of the document there. Try being more specific:
//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href
I'm pretty sure it works, tested online via XPath Query Expression Tool.
If you still do not get results, try to show all the html you are working on.
精彩评论