how to remove a tag and its contents using regular expression?
$str = 'some text tag contents more text ';
My questions are:
How to retrieve content tag <em>contents </em>
which is between <MY_TAG> .. </MY_TAG&g开发者_Go百科t;
?
And
How to remove <MY_TAG>
and its contents from $str
?
I am using PHP.
Thank you.
For removal I ended up just using this:
$str = preg_replace('~<MY_TAG(.*?)</MY_TAG>~Usi', "", $str);
Using ~ instead of / for the delimiter solved errors being thrown because of the backslash in the end tag, which seemed to be an issue even with escaping. Eliminating > from the opening tag allows for attributes or other characters and still gets the tag and all of its contents.
This only works where nesting is not a concern.
The Usi
modifiers mean U = Ungreedy, s = include linebreak characters, i = case insensitive.
If MY_TAG
can not be nested, try this to get the matches:
preg_match_all('/<MY_TAG>(.*?)<\/MY_TAG>/s', $str, $matches)
And to remove them, use preg_replace
instead.
You do not want to use regular expressions for this. A much better solution would be to load your contents into a DOMDocument and work on it using the DOM tree and standard DOM methods:
$document = new DOMDocument();
$document->loadXML('<root/>');
$document->documentElement->appendChild(
$document->createFragment($myTextWithTags));
$MY_TAGs = $document->getElementsByTagName('MY_TAG');
foreach($MY_TAGs as $MY_TAG)
{
$xmlContent = $document->saveXML($MY_TAG);
/* work on $xmlContent here */
/* as a further example: */
$ems = $MY_TAG->getElementsByTagName('em');
foreach($ems as $em)
{
$emphazisedText = $em->nodeValue;
/* do your operations here */
}
}
Although the only fully correct way to do this is not to use regular expressions, you can get what you want if you accept it won't handle all special cases:
preg_match("/<em[^>]*?>.*?</em>/i", $str, $match);
// Use this only if you aren't worried about nested tags.
// It will handle tags with attributes
And
preg_replace(""/<MY_TAG[^>]*?>.*?</MY_TAG>/i", "", $str);
I tested this function, it works for nested tags too, use true/false to exclude/include your tags. Found here: https://www.php.net/manual/en/function.strip-tags.php
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
return $text;
}
// Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
// Result for:
echo strip_tags_content($text);
// text with
// Result for:
echo strip_tags_content($text, '<b>');
// <b>sample</b> text with
// Result for:
echo strip_tags_content($text, '<b>', TRUE);
// text with <div>tags</div>
精彩评论