how to remove a tag and its contents using regular expression?

2022-12-22 10:58 问答作者：

$str = 'some text tag contents more text ';

My questions are: How to retrieve content tag <em>contents </em> which is between <MY_TAG> .. </MY_TAG&g开发者_Go百科t;?

And

How to remove <MY_TAG> and its contents from $str?

I am using PHP.

Thank you.

For removal I ended up just using this:

$str = preg_replace('~<MY_TAG(.*?)</MY_TAG>~Usi', "", $str);

Using ~ instead of / for the delimiter solved errors being thrown because of the backslash in the end tag, which seemed to be an issue even with escaping. Eliminating > from the opening tag allows for attributes or other characters and still gets the tag and all of its contents.

This only works where nesting is not a concern.

The Usi modifiers mean U = Ungreedy, s = include linebreak characters, i = case insensitive.

If MY_TAG can not be nested, try this to get the matches:

preg_match_all('/<MY_TAG>(.*?)<\/MY_TAG>/s', $str, $matches)

And to remove them, use preg_replace instead.

You do not want to use regular expressions for this. A much better solution would be to load your contents into a DOMDocument and work on it using the DOM tree and standard DOM methods:

$document = new DOMDocument();
$document->loadXML('<root/>');
$document->documentElement->appendChild(
    $document->createFragment($myTextWithTags));

$MY_TAGs = $document->getElementsByTagName('MY_TAG');
foreach($MY_TAGs as $MY_TAG)
{
    $xmlContent = $document->saveXML($MY_TAG);
    /* work on $xmlContent here */

    /* as a further example: */
    $ems = $MY_TAG->getElementsByTagName('em');
    foreach($ems as $em)
    {
        $emphazisedText = $em->nodeValue;
        /* do your operations here */
    }
}

Although the only fully correct way to do this is not to use regular expressions, you can get what you want if you accept it won't handle all special cases:

preg_match("/<em[^>]*?>.*?</em>/i", $str, $match);
// Use this only if you aren't worried about nested tags.
// It will handle tags with attributes

And

preg_replace(""/<MY_TAG[^>]*?>.*?</MY_TAG>/i", "", $str);

I tested this function, it works for nested tags too, use true/false to exclude/include your tags. Found here: https://www.php.net/manual/en/function.strip-tags.php

<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
  $tags = array_unique($tags[1]);
   
  if(is_array($tags) AND count($tags) > 0) {
    if($invert == FALSE) {
      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif($invert == FALSE) {
    return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return $text;
}




// Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

// Result for:
echo strip_tags_content($text);
// text with

// Result for:
echo strip_tags_content($text, '<b>');
// <b>sample</b> text with

// Result for:
echo strip_tags_content($text, '<b>', TRUE);
// text with <div>tags</div>

继续阅读：php regex

how to remove a tag and its contents using regular expression?

更多精彩内容

精彩评论

最新问答

男性抗体阴性？

结婚多年一直不能怀孕？

雷克萨斯es300臻享版是指什么音响？

小度在家怎么监控家里？

智能电视哪个牌子好?？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？