开发者

Selecting a deeply nested link using xpath query

<body class="en-us">   <div id="wrapper">
    <div id="content">
      <div class="content-top">
        <div class="content-bot">
          <div id="profile-wrapper" class=
          "profile-wrapper profile-wrapper-horde">
            <div class="profile-sidebar-anchor">
              <div class="profile-sidebar-outer">
                <div class="profile-sidebar-inner">
                  <div class="profile-sidebar-contents">
                    <div class="profile-sidebar-crest">
                      <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
                      </a>

                      <div class="profile-sidebar-info">
                        <div class="name">
                          <a href="/wow/en/character/some-server/sometoon/"
                          rel="np">Glitchshot</a>
                        </div>

                        <div class="under-name color-c8">
                          <span class="level"><strong>85</strong></span>
                          <a href="/wow/en/game/race/somerace" class="race">somerace</a> 
                          <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                        </div>

                        <div class="guild">
                          <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                          Some Guild</a>
                        </div>

                        <div class="realm">
                          <span id="profile-info-realm" class="tip"
                          data-battlegroup="Stormstrike">Black
                          Dragonflight</span>
                        </div>
                      </div>
                    </div>

               开发者_开发技巧     <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                      <li><a href=
                      "/wow/en/character/some-server/sometoon/" class=
                      "back-to" rel="np"><span class="arrow"><span class=
                      "icon">Character Summary</span></span></a></li>

                      <li class="root-menu"><a href=
                      "/wow/en/character/some-server/sometoon/achievement"
                         class="back-to" rel="np"><span class=
                         "arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=" active"><a href=
                      "/wow/en/character/some-server/sometoon/achievement#summary"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=""><a href=
                      "/wow/en/character/some-server/sometoon/achievement#92"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">General</span></span></a></li>

I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.

From this:

<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>

I would like to extract this:

/wow/en/character/some-server/sometoon/achievement#92

which comes from the last anchor in the posted markup.

I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.

<?php
    $query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
    echo $query . "<br>";
    $achievementSubCategory = $xpath->query($query);

    $achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
    var_dump($achiSubArray);
    // Produces array(1) { ["URL"]=> NULL } which should look something more like:
    // array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>

Thank you in advance for your assistance and advice


*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href

There are a few problems with this XPath expression:

  1. It is looking for a ul element that is a crandchild of the current node, and that has an attribute named class whose string value is equal to the string value of one of the children-elements of ul, named profile-sidebar-menu. However, the ul has no children named profile-sidebar-menu and the whole expression doesn't select any node.

  2. Another problem is the indexing. li[3] selects the third li element - child of the context node. However the wanted a element is a child of the fourth li child of the context node. This must be expressed as: li[4]. XPath positions are 1-based, not 0-based.

If these two problems are corrected, I believe that the corrected expression should look like the following:

*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href

The absolute XPath expression that selects the wanted href attribute starting from the top element body of the provided XML document, is:

/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href

Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:

<body class="en-us">
    <div id="wrapper">
        <div id="content">
            <div class="content-top">
                <div class="content-bot">
                    <div id="profile-wrapper" class=
              "profile-wrapper profile-wrapper-horde">
                        <div class="profile-sidebar-anchor">
                            <div class="profile-sidebar-outer">
                                <div class="profile-sidebar-inner">
                                    <div class="profile-sidebar-contents">
                                        <div class="profile-sidebar-crest">
                                            <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
                                            <div class="profile-sidebar-info">
                                                <div class="name">
                                                    <a href="/wow/en/character/some-server/sometoon/"
                              rel="np">Glitchshot</a>
                                                </div>
                                                <div class="under-name color-c8">
                                                    <span class="level">
                                                        <strong>85</strong>
                                                    </span>
                                                    <a href="/wow/en/game/race/somerace" class="race">somerace</a>
                                                    <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                                                </div>
                                                <div class="guild">
                                                    <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                              Some Guild</a>
                                                </div>
                                                <div class="realm">
                                                    <span id="profile-info-realm" class="tip"
                              data-battlegroup="Stormstrike">Black
                              Dragonflight</span>
                                                </div>
                                            </div>
                                        </div>
                                        <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                                            <li>
                                                <a href=
                          "/wow/en/character/some-server/sometoon/" class=
                          "back-to" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                          "icon">Character Summary</span></span>
                                                </a>
                                            </li>
                                            <li class="root-menu">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement"
                             class="back-to" rel="np">
                                                    <span class=
                             "arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class=" active">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#summary"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class="">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#92"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">General</span></span>
                                                </a>
                                            </li>
                                        </ul>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>

One can check that the above absolute XPath expression selects exactly the wanted href attribute, by evaluating it with a tool like the Xpath Visualizer.

Here is a snapshot of the selection, performed with the XPath Visualizer:

Selecting a deeply nested link using xpath query


If your DOM structure is consistent, then something like the following should work:

//ul[@class='profile-sidebar-menu']/li[last()]/a/@href

Your xpath statement makes no sense. You have multiple ul's in the path but the sample is not structured that way. Also, indexing in xpath starts at 1, not 0.


On the base of the html you show above (and assuming that final tags are correctly closed) the ewh'expression should work fine.

May be you omitted some important part of the document there. Try being more specific:

//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href

I'm pretty sure it works, tested online via XPath Query Expression Tool.

If you still do not get results, try to show all the html you are working on.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜