term frequency using java program
I have set of documents. I want to know the frequency count of each wor开发者_开发百科d in each document (i.e) term frequency using java program. thanks in advance. I know how to find the frequency count for each word. My question is about how to take the unique words in each document from the list of documents
You can split your documents on spaces and punctuation, go through the resulting array and then count frequency for each word (a Map<String, Integer>
would really help you with this).
Resources :
- Java - faster data structure to count word frequency?
On the same topic :
- How to count words in java
If it's more than a one time problem to solve, you should consider using Lucene to index your documents. Then this post would help you answer your question.
精彩评论