开发者

How to get the first div contains a text inside html/text?

i am using JSOUP and I have html/text something like:

<html><head><style type="text/css">
</style></head>
<body><div style="font-family:times new roman,new york,times,serif;font-size:14pt">first text<br><div><br></div><div style="font-family: times new roman,new york,times,serif; font-size: 14pt;"><br><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;"><font size="2" face="Tahoma"><hr size="1"><b><span style="font-weight: bold;">one:</span></b> second text<br><b><span style="font-weight: bold;">two:</span></b> third text<br><b><span style="font-weight: bold;">three:</span></b> fourth text<br><b><span style="font-weight: bold;">five:</span></b> fifth text<br></font><br>

and I want to extract the first div that co开发者_如何学Gontains a text (the whole div) to get an output like:

<div style="font-family:times new roman,new york,times,serif;font-size:14pt">first text<br></div>

and one more question is how to get the first html tag (in general) that contains a text meaning the first text maybe inside <p> or <span>

thanks in advance


You can use a SAX styled HTML parser, like TagSoup.

To do this, initialize the parser with an extended DefaultHandler to cache the last element visited in a local member variable, then detect when the first time the characters(...) method is called and print out the cached element and the text result.

Look to http://sax.sourceforge.net/quickstart.html for some direction in how to setup the parser.


Use HTML parser, or, if you know that HTML is XHTML, XSLT processor

Here is the list of open-source HTML parsers.


What about loading a temporaty DOM (a DOMFragment http://ejohn.org/blog/dom-documentfragments/) then turn to jQuery to find the div you want inside the fragment?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜