Java: I've created a list of word objects to include the name and the frequency, but having trouble updating the frequency
I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.
What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of 开发者_如何学编程occurrences in the dictionary.
Word(String word)
{
this.word = word;
this.freq = 0;
}
public String getWord() {
return word;
}
public int getFreq() {
return freq;
}
public void setFreq() {
freq = freq + 1;
}
This is how I am adding the word objects to the ArrayList...I think it's ok?
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
word.add(new Word(newWord));
count++;
}
Instead of an ArrayList use a Bag. This keeps the counts for you.
Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.
Map<String, Word> wordsMap = new HashMap<String, Word>();
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
if(!wordsMap.containsKey(newWord)){
wordsMap.put(newWord, new Word(newWord));
}else{
Word existingWord = wordsMap.get(newWord);
existingWord.setFreq();
}
count++;
}
I would solve the problem with the following code:
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Word {
private final String word;
private int frequency;
public Word(String word) {
this.word = word;
this.frequency = 0;
}
public String getWord() {
return word;
}
public int getFrequency() {
return frequency;
}
public void increaseFrequency() {
frequency++;
}
I didn't call this method setFrequency
because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.
public static List<Word> histogram(String sentence) {
First, compute the frequency of the individual words.
String[] words = sentence.split("\\W+");
Map<String, Word> histo = new HashMap<String, Word>();
for (String word : words) {
Word w = histo.get(word);
if (w == null) {
w = new Word(word);
histo.put(word, w);
}
w.increaseFrequency();
}
Then, sort the words such that words with higher frequency appear first. If the frequency is the same, the words are sorted almost alphabetically.
List<Word> ordered = new ArrayList<Word>(histo.values());
Collections.sort(ordered, new Comparator<Word>() {
public int compare(Word a, Word b) {
int fa = a.getFrequency();
int fb = b.getFrequency();
if (fa < fb)
return 1;
if (fa > fb)
return -1;
return a.getWord().compareTo(b.getWord());
}
});
return ordered;
}
Finally, test the code with a simple example.
public static void main(String[] args) {
List<Word> freq = histogram("a brown cat eats a white cat.");
for (Word word : freq) {
System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());
}
}
}
You can use a google collections' Multiset of String
instead of the Word
class
精彩评论