开发者

HTML parser for phrase and case sensitive searches, in Java

I would like to know if there are any HTML Parsers in Java that would support phrase and case sensitive searches. All I need to know is number of hit开发者_C百科s in a html page for searched phrase and support for case sensitivity.

Thanks, Sharma


Have you tried this?

You can search the text using regular expressions.


does not it help, if you take html page as text, strip html tags:

String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");

and now count what you need in noHTMLString ? It could be helpful, if you have html page with markup like:

this is <span>cool</span>

and you need to look for text "is cool" (because prev html page will be transformed into "this is cool" string). To count you can use StringUtils from Apache Commons Lang, it has special method called countMatches. Everything together should work as:

String htmlString = "this is <span>cool</span>";    
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");
int count = StringUtils.countMatches( noHTMLString, "is cool");

I would go with that approach, at least give it a try. It sounds better than parsing html, and then traversing it looking for words you need...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜