What is the input order for keys in reduce() method
I have a simple use case. In my input file I just need to calculate the percentage distribution of total number of words. For example word1 is present 10 times, word2 is present 5 times etc and the total number of words are 100 then I just need to display % word1 = 10 % , % word2 = 5% etc. So whenever I encounter a word I just put context.write(word,1) in map() and in the reduce I sum up the individual counts. B开发者_如何学JAVAut to calculate the percentage we require total number of words. I am also calculating that.
Hence before getting the keys for word1 or word2 in the reduce I am to get the total word count key for percentage calculation for each and every word. But in the reduce I am getting this total words key after some other keys. Hence I am unable to calculate the percentage.
I also tried to set this total count in the config of map using context.getConfiguration().setFloat("total count",count); But in the reduce I am not able to get this value from config. It simply returns null.
Any suggestions please add.
thank you..
You need to first digest your document, like this:
class WordCounter {
Map<String, Integer> totals = new HashMap<String, Integer>();
int wordCount;
void digest(String document) {
for (String word : document.split("\\w+")) {
wordCount++;
Integer count = totals.get(word);
if (count == null)
totals.put(word, 1);
else
totals.put(word, ++count);
}
}
}
Then you can do a second pass over your document doing what you like with the info you've collected, perhaps using something like this method on every word:
String decorateWithPercent(String word) {
return word + " (" + (totals.get(word) / wordCount) + "%)";
}
Or to print the frequencies, something like:
void printFrequencies() {
for (Map.Entry<String, Integer> wordCount : totals.entrySet()) {
System.out.println(wordCount.getKey() + " " + wordCount.getValue());
}
}
精彩评论