how to find frequency of a phrase (multiple token string) inside a document in java?

2023-03-27 21:57 问答作者：

I want t开发者_运维百科o find the frequency of a multiple-token-string or phrase inside a document. Its not the word/single-term frequency that I am looking for, its always will be multiple-term and the number of terms are dynamic ...

ex : searching the frequency of "words with friends" inside a document!

Any help/pointer will be much appreciated.

Thanks Debjani

You can read the document line by line using Buffered Reader, and then use split function to get the frequency of word/token

int count=0;
while ((strLine = br.readLine()) != null)   {
     count+ = (strLine.split("words with friends").length-1);     
}
return count;

EDIT: And if you want to perform case-insensitive search, then you can use

Pattern myPattern = Pattern.compile("words with friends", Pattern.CASE_INSENSITIVE);
int count=0;
while ((strLine = br.readLine()) != null)   {
     count+ = (myPattern.split(strLine).length-1);    
}
return count;

Why not use regex? Regex is optimized for this sort of task.

http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html

继续阅读：frequency phrase

how to find frequency of a phrase (multiple token string) inside a document in java?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？