How to remove consecutive links from a webpage?

2023-03-29 15:58 问答作者：

I wish to remove consecutive links on a webpage

Here is a sample

<div style="font-family: Arial;">
    <br>
    &nbsp;
    <a href="http://google.com">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</a>
    &nbsp;
    <a hre开发者_StackOverflow社区f="http://google.com">BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB</a>
    Google is a search
    <a href="http://www.google.com">engine</a>

In the above html I want to remove the first 2 A tags and not the third one (My script should only remove consecutive tags)

Don't use a regex for this. They are extremely powerful but not for finding this kind of "consecutive" tags.

I suggest you use DOM. Then you can browse the HTML as a tree. Here is an example (not tested):

$doc = new DOMDocument();
// avoid blank nodes when parsing
$doc->preserveWhiteSpace = false;
// reads HTML in a string, loadHtmlFile() also exists
$doc->loadHTML($html);
// find all "a" tags
$links = $doc->getElementsByTagName('a');
// remove the first link
$parent = $links->item(0)->parentNode;
$parent->removeChild($links->item(0));
// test the node following the second link
if ($links->item(1)->nextSibling->nodeType != XML_TEXT_NODE) {
    // delete this node ...
}
// print the modified HTML
// See DOMDocument's attributes if you want to format the output
echo $doc->saveHTML();

继续阅读：php regex

How to remove consecutive links from a webpage?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？