开发者

Using a generic Pair class and a Splaytree to count and store words and their frequencies in Java

I'm implementing a splaytree to hold words and their frequencies and chose to create a Pair class that would hold each word-frequency (key-value) pair. That is, each node of the splaytree holds a pair of the Pair class. The Pair class looks like this:

public class SplayEntry<K, V> implements Comparable<SplayEntry<K, V>>{

public K word;
public V frequency;

public SplayEntry(K word, V frequency) {
    this.word = word;
    this.frequency = frequency;
}
getters, setters, hashCode, equals, compareTo etc...

The Splaytree:

public class SplayTree<AnyType extends Comparable<? super AnyType>> {

public SplayTree( )
{
    nullNode = new BinaryNode<AnyType>( null );
    nullNode.left = nullNode.right = nullNode;
    root = nullNode;
}

And has BinaryNode class.

What I'm having trouble with is how to, for every word and frequency pair put it into the tree and also开发者_StackOverflow check whether the pair already exists and if so up the frequency by one. I read in a text file line by line and split each line into words then do a countWords() method that right now is a mess:

    public void countWords(String line) {
    line = line.toLowerCase();
    String[] words = line.split("\\P{L}+");
    SplayEntry<String, Integer> entry = new SplayEntry<String, Integer>(null, null);
    for (int i = 0, n = words.length; i < n; i++) {
        Integer occurances = 0;
        entry.setWord(words[i]);
        entry.setFrequency(occurances);

        if (tree.contains(entry.equals(entry)) && entry.getFrequency() == 0) {
            occurances = 1;

        } else {
            int value = occurances.intValue();
            occurances = new Integer(value + 1);
            entry.setFrequency(occurances);
        }

        entry = new SplayEntry<String, Integer>(words[i], occurances);
        tree.insert(entry);
    }
}

I know this isn't really working and I need help in figuring out how I should instantiate the SplayEntry class and in what order? I also want the method to, for every word in the words array, check whether it exists in a SplayEntry which is inside the tree (contains) and if the word is a new word then the frequency will be 1, else, the frequency will be +1. finally I just add the new SplayEntry into the Splaytree and let that put it in an appropriate node.

Right now I've just confused myself by working on the same piece of code for way too many hours than should be necessary, I would very much appreciate some pointers that can lead me in the right direction!

Please tell me if I've not made myself clear.


I suggest using a standard implementation of a splay tree, i.e. without the counters, and having a separate HashMap for frequencies. This does not sacrifice complexity, since operations on a splay tree are O(log n), while operations on a HashMap are O(1). To preserve encapsulation and invariants, you can put both within a larger class that exposes the required operations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜