Restrict fckeditor images to only those from my domain
I have a members website where we use a very locked-down version of the wonderful fckeditor for posting of member content. Recently we've started allowing smileys, which makes the members happy but has introduced a potential vulnerability in that it's now possible to insert images fro开发者_如何学Gom other domains, as well as the smileys which are served from ours.
Everything posted goes through a preview stage, during which the posted content is sanitized, so I'm thinking I need some extra php which removes any img tag whose src indicates it doesn't come from our domain (let's say it's "xyz.com"). As pointed out by drf in the first comment, this is not as straightforward as it may initially seem.
I'm sure this would apply to others too, but I haven't had any luck finding a solution & regex is not my strong point. As always, any and all help & suggestions would be appreciated.
Some people will tell you that RegExp is not the right thing for parsing HTML/XHTML. I am one of them. Try using an XML parser instead:
<?php
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents('input.html'));
$xpath = new DOMXpath($dom);
$img = $xpath->query('//img');
foreach($img as $i) {
$url = parse_url($i->getAttribute('src'));
if(isset($url['host']) && in_array($url['host'], array('yourdomain.com', 'www.yourdomain.com')) == false) {
// show an error
// -- or --
// remove the tag: $i->parentNode->removeChild($i)
echo sprintf('[FAIL] %s' . PHP_EOL, $i->getAttribute('src'));
}
else {
echo sprintf('[PASS] %s' . PHP_EOL, $i->getAttribute('src'));
}
}
Sample input:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<p><img src="/image.jpg"></p>
<p><img src="http://yourdomain.com/image.jpg"></p>
<p><img src="http://www.yourdomain.com/image.jpg"></p>
<p><img src="http://otherdomain.com/image.jpg"></p>
Sample output:
[PASS] /image.jpg
[PASS] http://yourdomain.com/image.jpg
[PASS] http://www.yourdomain.com/image.jpg
[FAIL] http://otherdomain.com/image.jpg
精彩评论