开发者

How HtmlCleaner handles Iframes in webpage

I want to understand ho开发者_StackOverfloww HtmlCleaner handles Iframes when cleaning raw html to produce valid xml output. One example of a page with iframes is this ebay product page.

When I print the output of HtmlCleaner for this page, I find that some iframe tags are intact while others are missing. One of the missing iframes is the iframe with id="d". It contains the product description and its body has been merged into the main page.

The XML Output of html cleaner: http://pastebin.com/03f9gtdC

Could anyone kindly look at it, or suggest some better HTML parsing library which is able to handle iframes gracefully. That library should be able to support XPath evaluation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜