开发者

What is the input order for keys in reduce() method

I have a simple use case. In my input file I just need to calculate the percentage distribution of total number of words. For example word1 is present 10 times, word2 is present 5 times etc and the total number of words are 100 then I just need to display % word1 = 10 % , % word2 = 5% etc. So whenever I encounter a word I just put context.write(word,1) in map() and in the reduce I sum up the individual counts. B开发者_如何学JAVAut to calculate the percentage we require total number of words. I am also calculating that.

Hence before getting the keys for word1 or word2 in the reduce I am to get the total word count key for percentage calculation for each and every word. But in the reduce I am getting this total words key after some other keys. Hence I am unable to calculate the percentage.

I also tried to set this total count in the config of map using context.getConfiguration().setFloat("total count",count); But in the reduce I am not able to get this value from config. It simply returns null.

Any suggestions please add.

thank you..


You need to first digest your document, like this:

class WordCounter {
    Map<String, Integer> totals = new HashMap<String, Integer>();
    int wordCount;

    void digest(String document) {
        for (String word : document.split("\\w+")) {
            wordCount++;
            Integer count = totals.get(word);
            if (count == null)
                totals.put(word, 1);
            else
                totals.put(word, ++count);
        }
    }
}

Then you can do a second pass over your document doing what you like with the info you've collected, perhaps using something like this method on every word:

String decorateWithPercent(String word) {
    return word + " (" + (totals.get(word) / wordCount) + "%)";
}

Or to print the frequencies, something like:

void printFrequencies() {
    for (Map.Entry<String, Integer> wordCount : totals.entrySet()) {
        System.out.println(wordCount.getKey() + " " + wordCount.getValue());
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜