How to match second <a> tag in this string

2022-12-11 05:02 问答作者：

I have a HTML fragment whic开发者_Python百科h contains two anchor tags in various parts of the HTML.

<span id="ctl00_PlaceHolderTitleBreadcrumb_ContentMap">
    <span><a class="ms-sitemapdirectional" href="/">My Site</a></span>
    <span> &gt; </span>
    <span><a class="ms-sitemapdirectional" href="/Lists/Announcements/AllItems.aspx">Announcements</a></span>
    <span> &gt; </span>
    <span class="ms-sitemapdirectional">Settings</span>
</span>

I'm looking to write a regular expression that will return the second anchor tag, which has 'Announcements' as it's text. In trying to write an expression, I keep getting both anchor tags returned - but I'm only interested in the second tag.

Is it possible to match the second tag only?

EDIT:

I will always know that I'm looking for an anchor tag which has 'Announcements' in it's text, if that helps.

Parse the fragment into a DOM. Use XPath to issue:

(//a)[2]

Done.

   /<a.+?>[^<>]*Announcements[^<>]*</a>/

PS. regular expression are the wrong tool for parsing html

/(<a.*?<\/a>).*?(<a.*?<\/a>)/

$1 matches the first tag, $2 matches the second

you don't have to use complicated regular expression for this if you don't want to. since you want to get anchors, and usually anchors has ending tags </a>, you can use your favourite language and do splits on </a> for each line. eg pseudocode

for each line in htmlfile
do
   var=split line on </a>
   for each item in var
   do
        if item has "Announcement" then
           print "found"
        end if
   done
done

<?php
$string = '<span id="ctl00_PlaceHolderTitleBreadcrumb_ContentMap"><span><a class="ms-sitemapdirectional" href="/">My Site</a></span><span> &gt; </span><span><a class="ms-sitemapdirectional" href="/Lists/Announcements/AllItems.aspx">Announcements</a></span><span> &gt; </span><span class="ms-sitemapdirectional">Settings</span></span>';

$dom = new DOMDocument();
$dom->loadHTML($string);
$anchors = $dom->getElementsByTagName('a');
if ( $anchors->length ) {
    $secondAnchor = $anchors->item(1);
    echo innerHTML($secondAnchor->parentNode);
}

function innerHTML($node){
    $doc = new DOMDocument();
    foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

    return $doc->saveHTML();
}

If you know the exact text of the element, and you know it's the last element of its kind in the fragment, you have more than enough information to match it with a regex. I suspect you're using a regex like this:

/<a\s+.*>Announcements<\/a>/s

...and the .* is matching everything between the <a of the first anchor tag and the >Announcements</a> of the second one. Switching to a non-greedy quantifier:

/<a\s+.*?>Announcements<\/a>/s

...doesn't help; a reluctant quantifier stops matching as soon as possible, but the problem here is that it starts matching too soon. You need to replace the .* with something more specific, something that can only match whatever comes between the opening <a and closing > of a single tag:

/<a\s+[^<>]+>Announcements<\/a>/

Now, when it reaches the end of the first <a> tag and doesn't see Announcements</a> it will abort that match attempt, move along and start fresh at the second <a> tag.

继续阅读：regex

How to match second <a> tag in this string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？