Selectively encoding HTML, how?
Allow me to explain my problem by before and after...
I have a comment system on a web community. Users can type in anything they want in a textarea, including spec开发者_如何学Goial characters and HTML tags. In MySQL, I store the comment body exactly as typed, without any intervention. However, upon display I use HTML entities to prevent users from messing with HTML:
<?= nl2br(htmlentities($comment['body'], ENT_QUOTES, 'UTF-8')) ?>
This is working fine. However, I am now trying to enrich the comment system by automatically converting some links that are placed inside comments into richer objects. This concerns a photo forum and sometimes users make references to other photos by pasting in URLs in the comments:
'http://www.jungledragon.com/image/12/eagle.html
Using regular expressions, I am replacing valid links like the above into markup. In this case, it would be replaced with an img tag so that instead of a link, users see a thumb of that image directly inline in the comment.
The replacement is working fine. However, since I am using htmlentities, the replacement markup will render as text, rather than a rendered image. No surprises here.
My question is, how can I selectively html encode a comment body? I want these links replacements to not be escaped, but everything else should be escaped.
Do the htmlentities first and the replacing afterwords.
Usually, you'd use a library to sanitize the HTML instead. A few are listed here:
http://htmlpurifier.org/comparison
精彩评论