How can I fetch html tags in a HTML document
Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle 开发者_如何学Cbrackets inclusive. How can I do this in Java ? Thanks<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>
... and use an html parser.
If you want to do it manually, iterate over the input chars and decide for each and every <
and >
whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.
Most parsers use some switch/case
pattern for evaluating each token (char in your case).
I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.
精彩评论