开发者

How can I fetch html tags in a HTML document

Hey I want to get tags from a html document.

That is everything that is contained within the angle brackets with the angle 开发者_如何学Cbrackets inclusive. How can I do this in Java ? Thanks


<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>

... and use an html parser.


If you want to do it manually, iterate over the input chars and decide for each and every < and > whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.

Most parsers use some switch/case pattern for evaluating each token (char in your case).


I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜