How can I get content of HTML <body>

2022-12-14 14:07 问答作者：

when I have html:

<html>
<head>
</head>
<body>
 text
  <div>
  text2
    <div>
    text3
    </div>
  </div>
</body>
</html>

how can I get with DOM parser in JAVA content of body: text <div> text2 <div> text3 </div> </div> becasuse method getTextContent return:text text2 text3. - so开发者_Go百科 without tags.

It is possible with SAX, but it is possible with DOM, too?

The getTextContent is behaving as I would expect - getting the textural content of the HTML fragment. Can you check the API docs for the DOM parser and see if there's a similar method with a name like getHtmlContent?

You would need to parse the document into a DOM and serialise only the portion of the DOM you wanted. Using the DOM Level 3 LS interfaces you can serialise the outer-XML of a single node with:

LSSerializer serializer= implementation.createLSSerializer();
String html= serializer.writeToString(node);

To get the inner-XML you would need to writeToString each child node in turn (eg. into a StringBuffer).

Depending on what DOM implementation you are using there may be alternative non-standard methods. There may also be risks with serialising HTML as XML, if that's what you're doing... eg. a standard XML serialiser may output a self-closing tag for an empty tag, which can confuse browsers parsing the output as legacy-HTML.

继续阅读：dom

How can I get content of HTML <body>

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？