开发者

When to use document.implementation.createHTMLDocument?

What are some use cases and is it deprecated? As I found out at http://groups.google.com/group/envjs/browse_thread/thread/6c22d0f959666009/c389fc11537f2a97 that it's "non-standard and not supported by any modern browser".

About document.implementation at http://javascript.gakaa.com/document-implementation.aspx:

Returns a reference to the W3C DOMImplementation object, which represents, to a limited degree, the environment that makes up the document containerthe browser, for our purposes. Methods 开发者_StackOverflow社区of the object let you see which DOM modules the browser reports supporting. This object is also a gateway to creating virtual W3C Document and DocumentType objects outside of the current document tree. Thus, in Netscape 6 you can use the document.implementation property as a start to generating a nonrendered document for external XML documents. See the DOMImplementation object for details about the methods and their browser support.

Given that it provides methods (such as createHTMLDocument) for creating a non-rendered document outside of the current document tree, would it be safe to feed it untrusted third party HTML input that may contain some XSS? I ask because I would like to use createHTMLDocument for traversal purposes of third party HTML input. May that be one of the use cases?


I always use this because it doesn't make requests to images, execute scripts or affect styling:

function cleanHTML( html ) {
    var root = document.implementation.createHTMLDocument().body;

    root.innerHTML = html;

    //Manipulate the DOM here
    $(root).find("script, style, img").remove(); //jQuery is not relevant, I just didn't want to write exhausting boilerplate code just to make a point

    return root.innerHTML;
}


cleanHTML( '<div>hello</div><img src="google"><script>alert("hello");</script><style type="text/css">body {display: none !important;}</style>' );
//returns "<div>hello</div>" with the page unaffected


Yes. You can use this to load untrusted third-party content and strip it of dangerous tags and attributes before including it into your own document. There is some great research incorporating this trick, described at http://blog.kotowicz.net/2011/10/sad-state-of-dom-security-or-how-we-all.html.

The technique documented by Esailija above is insufficient, however. You also need to strip out most attributes. An attacker could set an onerror or onmouseover element to malicious JS. The style attribute can be used to include CSS that runs malicious JS. Iframe and other embed tags can also be abused. View source at https://html5sec.org/xssme/xssme2 to see a version of this technique.


Just a cleaner answer besides @Esailija and @Greg answers: This function will create another document outside the tree of current document, and clean all scripts, styles and images from the new document:

function insertDocument (myHTML) {
    var newHTMLDocument = document.implementation.createHTMLDocument().body;
    newHTMLDocument.innerHTML = myHTML;
    [].forEach.call(newHTMLDocument.querySelectorAll("script, style, img"), function(el) {el.remove(); });
    documentsList.push(newHTMLDocument);
    return $(newHTMLDocument.innerHTML);
}

This one is fantastic for making ajax requests and scraping the content will be faster :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜