Is there a good Javascript based HTML parsing library available?
My goal is to take HTML entered by an end user, remove certain unsafe tags like <script>
, and add it to the document. Does anybody know of a good Javascript library to sanitize html?
I searched around and found a few开发者_开发知识库 online, including John Resig's HTML parser, Erik Arvidsson's simple html parser, and Google's Caja Sanitizer, but I haven't been able to find much information about whether people have had good experiences using these libraries, and I'm worried that they aren't really robust enough to handle arbitrary HTML. Would I be better off just sending the HTML to my Java server for sanitization?
You can parse HTML with jQuery, but I'm pretty sure any blacklist based (i.e. filtering out) approach to sanitizing is going to fail - you probably need a "filtering in" based approach and ultimately you don't want to be relying on JavaScript for security anyway. In any case for reference you can use jQuery for DOM-parsing like this:
var htmlS = "<html>etc.etc.";
$(htmlS).remove("script"); /* DONT RELY ON THIS FOR SECURITY */
Would I be better off just sending the HTML to my Java server for sanitization?
Yes.
Filtering "unsafe" input must be done server-side. There is no other way to do it. It's not possible to do filtering client-side because the "client-side" could be a web browser or it could just as easily be a bot with a script.
精彩评论