开发者

jsoup: removing iframe tags

I am using jsoup 1.6.1 and facing the problem when I try to remove iframe tag from html. When iframe do not have any body开发者_开发知识库(i.e <iframe pro=value />), the remove() method removes all the contents after thet tag. Here is my sample code.

String html ="&lt;p> This is start.&lt;/p>&lt;iframe frameborder="0" marginheight="0" />&lt;p> This is end&lt;/p>";
Document doc = Jsoup.parse(html,"UTF-8");<br>
doc.select("iframe").remove();<br>
System.out.println(doc.text());

It returns to me -

This is start.

But I am expecting the result -

This is start. This is end

Thanks in advance


It appears the closing tag for iframe is required. You can't use a self closing tag:

http://msdn.microsoft.com/en-us/library/ie/ms535258(v=vs.85).aspx http://stackoverflow.com/questions/923328/line-after-iframe-is-not-visible http://www.w3resource.com/html/iframe/HTML-iframe-tag-and-element.php

So, Jsoup is following the spec and taking whatever follows the iframe tag and using that as its body. When you remove the iframe, "This is the end" gets removed along with it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜