开发者

Strip tag with class in PHP

So I need to strip the span tags of class tip. So that would be <span class="tip"> and the corresponding </span>, and everything inside it...

I suspect a regular expression开发者_StackOverflow社区 is needed but I terribly suck at this.


Laugh...

<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>

Gives no error... But

<?php
$str = preg_replace('<span class="tip">.+</span>', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>

Gives me the error:

Warning: preg_replace() [function.preg-replace]: Unknown modifier '.' in <A FILE> on line 4

previously, the error was at the ); in the 2nd line, but now.... >.>


This is the "proper" method (adapted from this answer).

Input:

<?php
$str = '<div>lol wut <span class="tip">remove!</span><span>don\'t remove!</span></div>';
?>

Code:

<?php
function recurse(&$doc, &$parent) {
   if (!$parent->hasChildNodes())
      return;

   for ($i = 0; $i < $parent->childNodes->length; ) {
      $elm = $parent->childNodes->item($i);
      if ($elm->nodeName == "span") {
         $class = $elm->attributes->getNamedItem("class")->nodeValue;
         if (!is_null($class) && $class == "tip") {
            $parent->removeChild($elm);
            continue;
         }
      }

      recurse($doc, $elm);
      $i++;
   }
}

// Load in the DOM (remembering that XML requires one root node)
$doc = new DOMDocument();
$doc->loadXML("<document>" . $str . "</document>");

// Iterate the DOM
recurse($doc, $doc->documentElement);

// Output the result
foreach ($doc->childNodes->item(0)->childNodes as $node) {
   echo $doc->saveXML($node);
}
?>

Output:

<div>lol wut <span>don't remove!</span></div>


A simple regular expression like:

<span class="tip">.+</span>

Wont work, the issue being that if another span was opened and closed inside the tip span, your regex will terminate with its ending, rather than the tip one. DOM Based tools like the one linked in the comments will really provide a more reliable answer.

As per my comment below, you need to add pattern delimiters when working with regular expressions in PHP.

<?php
$str = preg_replace('\<span class="tip">.+</span>\', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>

may be moderately more successful. Please take a look at the documentation page for the function in question.


Now without regexp, and without heavy XML parsing:

$html = ' ... <span class="tip"> hello <span id="x"> man </span> </span> ... ';
$tag = '<span class="tip">';
$tag_close = '</span>';
$tag_familly = '<span';

$tag_len = strlen($tag);

$p1 = -1;
$p2 = 0;
while ( ($p2!==false)  && (($p1=strpos($html, $tag, $p1+1))!==false) ) {
  // the tag is found, now we will search for its corresponding closing tag
  $level = 1;
  $p2 = $p1;
  $continue = true; 
  while ($continue) {
     $p2 = strpos($html, $tag_close, $p2+1);
     if ($p2===false) {
       // error in the html contents, the analysis cannot continue
       echo "ERROR in html contents";
       $continue = false;
       $p2 = false; // will stop the loop
     } else {
       $level = $level -1;
       $x = substr($html, $p1+$tag_len, $p2-$p1-$tag_len);
       $n = substr_count($x, $tag_familly);
       if ($level+$n<=0) $continue = false;
     }
  }
  if ($p2!==false) {
    // delete the couple of tags, the farest first
    $html = substr_replace($html, '', $p2, strlen($tag_close));
    $html = substr_replace($html, '', $p1, $tag_len);
  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜