[JSOUP]why 1.6.x remove TD tags,Problems upgrading to 1.6.x
System.out.println(Jsoup.parseBodyFragment("<td>123</td>").html());
jsoup 1.5.2 OUTPUT:
<html>
<head></head>
<body>
<table>
<tbody>
<tr>
<td>123</td>
</tr>
</tbody>
</table>
</body>
</html>
jsoup 1.6.x (1.6.0 and 1.6.1) OUTPUT:
<html>
<head></head>
<body>
123
&l开发者_如何学运维t;/body>
</html>
why 1.6.x remove TD
tags?
how can I get jsoup 1.5.x OUTPUT in 1.6.x?
In jsoup 1.6 I have rewritten the HTML parser to implement the whatwg HTML spec, which matches how browsers currently parse HTML.
The impact here is that in 1.5, a <td>
was enough to auto-vivify a <table>
; however browsers don't actually work that way, so in 1.6 you'll need to update your HTML input to introduce the <table>
tag.
For example:
System.out.println(
Jsoup.parseBodyFragment("<table><td>123</td></table>").html());
will produce:
<html>
<head></head>
<body>
<table>
<tbody>
<tr>
<td>123</td>
</tr>
</tbody>
</table>
</body>
</html>
Note that the <table><td>
gets normalised to <table><tbody><tr><td>...
.
Hope this helps!
精彩评论