开发者

best python lib to clean the tag (not safe), and keep the tag that i think safe

ex: i want to clean the "script" tag ,开发者_开发技巧 but i want to keep the 'a' tag ,

so what lib you using to do this .

and i use jquery cleditor for WYSIWYG HTML editor , can it do this for me automatically ?

thanks


I have to do this automatically for a project of mine. The solution I have found is to use the Beautiful Soup module to extract the script tag (I also do this for style and form).

soup = BeautifulSoup(html_string, convertEntities=BeautifulSoup.HTML_ENTITIES)

scripts = soup.findAll('script')   # find and return a list of 'script' entities
for s in scripts:
    s.extract()   # remove it from the DOM completely

Then, you can have BeautifulSoup print out or save the html.


I suppose that BeautifulSoup should do the trick, here.

Actually, here's a question + answers that's exactly about that : Python HTML sanitizer / scrubber / filter


Another option, designed for sanitization, is html5lib.

Whatever you do, do not rely on an editor component to do it for you: That runs on the client, so could easily be manipulated to submit invalid or malicious HTML!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜