开发者

Problems with Xalan using XPATH (unclosed tags)

Greetings,

I'm facing a problem with the following tech-stack: JWebUnit -> HtmlUnit -> Xalan. I'm trying to find an element by XPATH, but the HTML document is pretty malformed.

Xalan stops finding elements when I reach the /body element on XPATH. I believe it's because the document contains two <body> tags and one being unclosed.

Everything works for /html/head or /html. But when I try /html/body (or /html/body[1], //body[1], or anything inside those tags) I get only null from Xalan.

Is there any way to get around with that? I just can't change the html docu开发者_Go百科ment istself. Thank you kindly for your attention.

Best regards, Thiago


HtmlUnit must be using something to convert HTML to XML. Perhaps you can tell it to use jsoup or tagsoup, which are very tolerant of messy HTML?

You might as well also write code to just dump the XML tree to a file so you can see what's in it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜