开发者

Java: I've created a list of word objects to include the name and the frequency, but having trouble updating the frequency

I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.

What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of 开发者_如何学编程occurrences in the dictionary.

Word(String word)
{
  this.word = word;
  this.freq = 0;
}

public String getWord() { 
    return word;
}

public int getFreq() {
   return freq;
}

public void setFreq() {
   freq = freq + 1;
}

This is how I am adding the word objects to the ArrayList...I think it's ok?

String pattern = "[^a-zA-Z\\s]";
        String strippedString = line.replaceAll(pattern, "");
        line = strippedString.toLowerCase();
        StringTokenizer st = new StringTokenizer(line);
        while (st.hasMoreTokens())
        {
            String newWord = st.nextToken();
            word.add(new Word(newWord));
            count++;
        }


Instead of an ArrayList use a Bag. This keeps the counts for you.


Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.

Map<String, Word> wordsMap = new HashMap<String, Word>();

String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
    String newWord = st.nextToken();
    if(!wordsMap.containsKey(newWord)){
        wordsMap.put(newWord, new Word(newWord));
    }else{
        Word existingWord = wordsMap.get(newWord);
        existingWord.setFreq();
    }

    count++;
}


I would solve the problem with the following code:

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Word {

  private final String word;
  private int frequency;

  public Word(String word) {
    this.word = word;
    this.frequency = 0;
  }

  public String getWord() {
    return word;
  }

  public int getFrequency() {
    return frequency;
  }

  public void increaseFrequency() {
    frequency++;
  }

I didn't call this method setFrequency because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.

  public static List<Word> histogram(String sentence) {

First, compute the frequency of the individual words.

    String[] words = sentence.split("\\W+");
    Map<String, Word> histo = new HashMap<String, Word>();
    for (String word : words) {
      Word w = histo.get(word);
      if (w == null) {
        w = new Word(word);
        histo.put(word, w);
      }
      w.increaseFrequency();
    }

Then, sort the words such that words with higher frequency appear first. If the frequency is the same, the words are sorted almost alphabetically.

    List<Word> ordered = new ArrayList<Word>(histo.values());
    Collections.sort(ordered, new Comparator<Word>() {
      public int compare(Word a, Word b) {
        int fa = a.getFrequency();
        int fb = b.getFrequency();
        if (fa < fb)
          return 1;
        if (fa > fb)
          return -1;
        return a.getWord().compareTo(b.getWord());
      }
    });

    return ordered;
  }

Finally, test the code with a simple example.

  public static void main(String[] args) {
    List<Word> freq = histogram("a brown cat eats a white cat.");
    for (Word word : freq) {
      System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());
    }
  }
}


You can use a google collections' Multiset of String instead of the Word class

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜