开发者

How can I get the html contained in the body tag of an iframe without it replacing characters?

I'm currently trying to get the contents of an iframe's body without any mangling of content by the browser.

I could do it by including the content in a textarea, however I want to avoid that.

using .innerHTML results in special characters such as < > and & being converted to &lt;, &gt;, and &amp; respectively.

To test, build an html file containing:

{ 
 "id": 5, 
 "testtext":"I am > than this & < that", 
 "html":"<div>\"worky\"</div>" 
}

and then another page that includes that file in an iframe:

<!doctype html>
<html>
  <head>
    <script src="http://code.jquery.com/jquery-latest.js"></script>
  </head>
  <body>
    <iframe id="myIframe" name="myIframe" src="test.html"></iframe><br />
    Result:<br />
    <textarea id='result'></textarea>
    <script>
      $("#myIframe").load(function(){
        var iframeBody = window.frames.myIframe.document
            .getElementsByTagName("body")[0], result;
        result = iframeBody.innerHTML;
        $("#result").val(result);
      });
    </script>
  </body>
</html>

I have tried this:

result = $(iframeBody).contents().map(function(){
      return this.nodeValue ? this.nodeValue : this.innerHTML;
}).get().join("");

however it loses the div.

EDIT:

I have somewhat of a solution,

var iframeBody, result;
$("#myIframe").load(function(){
  iframeBody = window.frames.myIframe.document
    .getElementsByTagName("body")[0];
  result = $(iframeBody).contents().map(function(){
    if (this.nodeValue) {
        return this.nodeValue   
    }
    else {
        return $(this).clone().wrap('<p>').parent().html();
    }
  }).get().join("");
  $("#result").val(result);
});

However it will still encode things within the html that aren't html. I'm not sure if I'm ok with that.

EDIT AGAIN

Here's a little more context. I'm modifying a jquery iframe ajax transport to work without requiring a textarea in the iframe to hold the content when the content isn't html. For the most part it works fine without a textarea, however it ends up mangling any special html characters when you retrieve that text using .innerHTML. One way to avoid the mangling is to get the text using .nodeValue, however that doesn't work when you come across an html elemen开发者_Python百科t. If you return json that contains an html string for whatever reason, it needs to be able to extract that json string exactly as it was returned within the iframe, meaning leaving all characters in-tact.

For the purpose of testing, this jsfiddle is enough of a test. Imagine that the div used in the fiddle is the body of the iframe and you can test the results in jsfiddle. The problem I'm having really has nothing to do with the iframe or it's load event.

http://jsfiddle.net/P623a/2/

In that fiddle, the only issue is the & being converted to & inside of the div within the json.

Solution

I'm going to just require that the page is properly encoded (application/json, script, or plain/text) if the response is json/jsonp/script and contains a dom element. If it isn't properly encoded under those conditions, the error handler is triggered.

When encoded properly, the iframe will end up having a body tag that contains <pre>your content</pre> which you can get the content of using .innerText while preserving the special characters.


The browser is interpreting the data in the iframe as HTML and, as far as I know, there is no way to get at the original text (à la view source).

Here are the options I can come up with:

  • Make the response valid HTML — wrap it in a document and encode the data you want, something like this:

    <!DOCTYPE html>
    <html>
    <head>
    <body>
    { 
     "id": 5, 
     "testtext":"I am &gt; than this &amp; &lt; that", 
     "html":"&lt;div&gt;\"worky\"&lt;/div&gt;" 
    }
    
  • Send your response with a MIME type that doesn’t get interpreted as HTML, like application/json or text/plain. The browser will probably build a document around it (putting the data in, say, a pre) and you can get at it the same way.

In either case, you can get at the innerText (or textContent, depending on browser) of the document or the nodeValue of the text node which contains your data, like this:

var iframeBody = iframe.contentDocument.body,
    json = iframeBody.textContent || iframeBody.innerText;


The code you have in test1.html has no "body", you can't .getElementsByTagName("body") if there's not body. Try:

$("#myIframe").load(function(){
    $("#result").val($(this).contents().text());
});


You are setting the iframe load event handler after iframe tag which already has the source. So its quiet possible that iframe gets loaded before the load event handler is attached. I am not saying this is the issue but this will create an issue if the iframe loads quickly. You can provide a inline load event handler in the iframe tag itself.

Try this

<!doctype html>
<html>
  <head>
    <script src="http://code.jquery.com/jquery-latest.js"></script>
    <script type="text/javascript">
    function copyIframeContent(iframe){
        var iframeContent = $(iframe).contents();
        $("#result").html(iframeContent.find('body').html());
    }
    </script>
  </head>
  <body>
    <iframe id="myIframe" onload="copyIframeContent(this);" name="myIframe" src="test.html"></iframe><br />
    Result:<br />
    <textarea id='result'></textarea>
  </body>
</html>

I hope this helps you.


I think you have to first try with a valid html if you plan to use nodeValue or else, you can't just assume that the browser will add the body for you, this is not html at all :

{ 
 "id": 5, 
 "testtext":"I am > than this & < that", 
 "html":"<div>\"worky\"</div>" 
}

It is weird to try parse a dom that is not html ! The fact is if you want to get any chance to manipulate or traverse with jQuery you must at least wrap all things in one grand container like :

<div>
// even if you don't want use body or html tag, things must be wrapped here
</div>

I think there is a problem of misconception of what and how you are trying to acomplish your needs, shouldn't be easier to load some json (like you wrote) ?! you are trying to roll a cube...if you wan't to parse your pure datas trought dom anyway, you can test something like this :

<p>
<p>id<span>5</span></p>
<p>testtext<span>I "am" > than this & < that</span></p>
</p>

Of course you just can't insert html as plain text because how the browser is supposed to know what to do ? Just make a simple test :

var div = $('<div/>').appendTo('body').html('I "am" > than this & < that');
console.log('plainText :', div.text(), ', html :', div.html());
// works as expected...


Can you url encode your JSON string before you pass it to the iframe? For example... if you change your html string: "<div>\"worky\"</div>" to "&lt;div>\"worky\"&lt;/div>" it shows the div html properly. The div elements are being written to the dom when the iframe is loaded so you need to prevent it from parsing the html elements in your string properly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜