开发者

php regular expression issue

yes, I know that using regular expressions on html is not preferred, but I am still confused as to why this doesn't work:

I'm trying to remove the "head" from a document.

Here's the doc:

<html>
 <head>
   <!--
     a comment within the head
     -->
 </head>
 <开发者_开发百科body>
stuff in the body
 </body>
</html>

My code:

$matches = array(); $result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches); 
var_dump ($matches);

This does not actually work. Here's the output I see:

array(3) { [0]=> string(60) " " [1]=> string(47) " " [2]=> string(7) "" }

However, if I adjust the HTMl doc to not have the comment

What am I missing?

Thanks!


Your regular expression looks fine, but that extracts the <head>; you want to remove the head. Try using preg_replace instead:

$without_head = preg_replace ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', '', $contents);


Your script is working fine, it's not displaying correctly due to the HTML in the dump (you can tell by the lengths in your var_dump output). Try:

$result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches); 
ob_start(); // Capture the result of var_dump
var_dump ($matches);
echo htmlentities(ob_get_clean()); // Escape HTML in the dump

Also, as has been said, you need to use preg_replace to replace the match with '' in order to actually remove the head.


php > $str=<<<EOS
<<< > <head>
<<< >    <!--
<<< >      a comment within the head
<<< >      -->
<<< >  </head>
<<< > EOS;
php > $r=preg_match('/(?:<head[^>]*>)(.*?)(<\/head>)/is',$str,$matches);
php > var_dump($r);
int(1)
php > var_dump($matches);
array(3) {
  [0]=>
  string(63) "<head>
   <!--
     a comment within the head
     -->
 </head>"
  [1]=>
  string(50) "
   <!--
     a comment within the head
     -->
 "
  [2]=>
  string(7) "</head>"
}

Do you mean to use preg_replace?

php > $r=preg_replace('/(?:<head[^>]*>)(.*?)(<\/head>)/is','',$str);
php > var_dump($r);
string(0) ""
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜