Libs for HTML sanitizing
I'm looking for a htm开发者_如何学Pythonl sanitizer which I can call per API to sanitise strings which I get from my webapp. Are there some useful easy to use libs available? Does anyone knows maybe one or two?
I don't need something big it just must be able to find unclosed tags and close them.
https://github.com/OWASP/java-html-sanitizer is now marked ready for production use.
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.
You can use prepackaged policies
Sanitizers.FORMATTING.and(Sanitizers.LINKS)
or the tests show how you can configure your own easily:
new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
or write custom policies to do things like changing h1
s to div
s with a certain class:
new HtmlPolicyBuilder()
.allowElements("h1", "p")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
attrs.add("class");
attrs.add("header-" + elementName);
return "div";
}
}, "h1"))
JTidy may help you.
The HTML Parser JSoup also supports sanitisation by policy: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
Apart from JTidy you can also take a look at:
Nekohtml
TagSoup
Getting text in HTmL document
http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/
精彩评论