php regular expression issue
yes, I know that using regular expressions on html is not preferred, but I am still confused as to why this doesn't work:
I'm trying to remove the "head" from a document.
Here's the doc:<html>
<head>
<!--
a comment within the head
-->
</head>
<开发者_开发百科body>
stuff in the body
</body>
</html>
My code:
$matches = array(); $result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches);
var_dump ($matches);
This does not actually work. Here's the output I see:
array(3) { [0]=> string(60) " " [1]=> string(47) " " [2]=> string(7) "" }
However, if I adjust the HTMl doc to not have the comment
What am I missing?
Thanks!
Your regular expression looks fine, but that extracts the <head>
; you want to remove the head. Try using preg_replace
instead:
$without_head = preg_replace ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', '', $contents);
Your script is working fine, it's not displaying correctly due to the HTML in the dump (you can tell by the lengths in your var_dump
output). Try:
$result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches);
ob_start(); // Capture the result of var_dump
var_dump ($matches);
echo htmlentities(ob_get_clean()); // Escape HTML in the dump
Also, as has been said, you need to use preg_replace
to replace the match with ''
in order to actually remove the head.
php > $str=<<<EOS
<<< > <head>
<<< > <!--
<<< > a comment within the head
<<< > -->
<<< > </head>
<<< > EOS;
php > $r=preg_match('/(?:<head[^>]*>)(.*?)(<\/head>)/is',$str,$matches);
php > var_dump($r);
int(1)
php > var_dump($matches);
array(3) {
[0]=>
string(63) "<head>
<!--
a comment within the head
-->
</head>"
[1]=>
string(50) "
<!--
a comment within the head
-->
"
[2]=>
string(7) "</head>"
}
Do you mean to use preg_replace?
php > $r=preg_replace('/(?:<head[^>]*>)(.*?)(<\/head>)/is','',$str);
php > var_dump($r);
string(0) ""
精彩评论