开发者

Help with Regex expression?

I'm trying to use preg_replace to filter member comments. To filter script and img tags. If src is from my site, allow it with tags, if from another site, just show the src

Regex Expression:

  <(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?

Using:

 preg_replace($pattern, $2, $comment);

Comment :

Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
<img src="http://www.mysite.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://www.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://fakesite.com/blah/blah/image.jpg"></img>
Which one is your favorite?

Wanted Outcome:

 Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
http://开发者_JAVA百科www.mysite.fakesite.com/blah/blah/image.jpg   (notice that it's just url, because it's not from my site)
http://www.fakesite.com/blah/blah/image.jpg
http://fakesite.com/blah/blah/image.jpg
Which one is your favorite?

Anyone see anything wrong?


I'm trying to use preg_replace to filter member comments. To filter script and img tags.

HTML Purifier is going to be the best tool for this purpose, though you want a whitelist of acceptable tags and attributes, not a blacklist of specific harmful tags.


The biggest thing wrong I can see is trying to use regex to modify HTML.

You should use DOMDOcument.

$dom = new DOMDocument('1.0', 'UTF-8');

$dom->loadHTML($content);

foreach($dom->getElementsByTag('img') as $element) {

    if ( ! $element->hasAttribute('src')) {
        continue;
    }

    $src = $element->getAttribute('src');

    $elementHost = parse_url($src, PHP_URL_HOST);
    $thisHost = $_SERVER['SERVER_NAME'];

    if ($elementHost != $thisHost) {
        $element->parentNode->insertBefore($dom->createTextNode($src), $element);
        $element->parentNode->removeChild($element);
    }

}


you shoud use im mode;

#<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?#im
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜