Selecting a deeply nested link using xpath query
<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
开发者_开发技巧 <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.
From this:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
I would like to extract this:
/wow/en/character/some-server/sometoon/achievement#92
which comes from the last anchor in the posted markup.
I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
Thank you in advance for your assistance and advice
*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href
There are a few problems with this XPath expression:
It is looking for a
ulelement that is a crandchild of the current node, and that has an attribute namedclasswhose string value is equal to the string value of one of the children-elements oful, namedprofile-sidebar-menu. However, theulhas no children namedprofile-sidebar-menuand the whole expression doesn't select any node.Another problem is the indexing.
li[3]selects the thirdlielement - child of the context node. However the wantedaelement is a child of the fourthlichild of the context node. This must be expressed as:li[4]. XPath positions are 1-based, not 0-based.
If these two problems are corrected, I believe that the corrected expression should look like the following:
*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href
The absolute XPath expression that selects the wanted href attribute starting from the top element body of the provided XML document, is:
/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href
Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:
<body class="en-us">
<div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level">
<strong>85</strong>
</span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li>
<a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np">
<span class="arrow">
<span class=
"icon">Character Summary</span></span>
</a>
</li>
<li class="root-menu">
<a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np">
<span class=
"arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class=" active">
<a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np">
<span class="arrow">
<span class=
"icon">Achievements</span></span>
</a>
</li>
<li class="">
<a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np">
<span class="arrow">
<span class=
"icon">General</span></span>
</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
One can check that the above absolute XPath expression selects exactly the wanted href attribute, by evaluating it with a tool like the Xpath Visualizer.
Here is a snapshot of the selection, performed with the XPath Visualizer:

If your DOM structure is consistent, then something like the following should work:
//ul[@class='profile-sidebar-menu']/li[last()]/a/@href
Your xpath statement makes no sense. You have multiple ul's in the path but the sample is not structured that way. Also, indexing in xpath starts at 1, not 0.
On the base of the html you show above (and assuming that final tags are correctly closed) the ewh'expression should work fine.
May be you omitted some important part of the document there. Try being more specific:
//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href
I'm pretty sure it works, tested online via XPath Query Expression Tool.
If you still do not get results, try to show all the html you are working on.
加载中,请稍侯......
精彩评论