开发者

preg_replace throws seg fault

When I execute the following code; I get a seg fault every time! Is this a known bug? How can I make this code work?

<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
    "/<script([\s\S])*?<\/ ?script>/",
    "/<style([\s\S])*?<\/ ?style>/",
    "/<!--([\s\S])*?-->/",
    "/\r\n/"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>

The error (obviously) looks like:

[root@localhost 2.开发者_开发技巧0]# php test.php
Segmentation fault (core dumped)


You have unnecessary capture groups that strain PCRE's backtracking. Try this:

$replace = array(
    "/<script.*?><\/\s?script>/s",
    "/<style.*?><\/\s?style>/s",
    "/<!--.*?-->/s",
    "/\r\n/s"
);

Another thing, \s (whitespace) combined with \S (non-whitespace) matches anything. So just use the . pattern.


OK! It seems like there is some issue with the () operators...

When I use

$doc = preg_replace("/<style([\s\S]*)<\/ ?style>/",'',$doc);

instead of

$doc = preg_replace("/<style([\s\S])*<\/ ?style>/",'',$doc);

it works!!


This seems to be a bug.

As mentioned by you in the comment, it is the style regex that is causing this. As a workaround you can use the s modifier so that . matches even the newline:

$doc = preg_replace("/<style.*?<\/ ?style>/s",'',$doc);


Try this (added option u for unicode and changed ([\s\S])? to .? :

<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
    "#<script.*?</ ?script>#u",
    '#<style.*?</ ?style>#u',
    "#<!--.*?-->#u",
    "#\r\n#u"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>


What is the point of [\s\S]? It matches any whitespace character, and any non-whitespace character. If you replace it with .*, it works just fine.

EDIT: If you want to match new lines too, use the s modifier. In my opinion, it is easier to understand than a contradictory [\s\S].

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜