开发者

mongodb map reduce value.count

In mongodb, I have a map function as below:

var map = function() {
    emit( this.username, {count: 1, otherdata:otherdata} );
}

and reduce function as below:

var reduce = function(key, values) { 
    values.forEach(function(value){开发者_开发技巧
        total += value.count; //note this line
    }
    return {count: total, otherdata: values[0].otherdata}; //please ignore otherdata
}

The problem is with the line noted:

total += value.count;

In my dataset, reduce function is called 9 times, and the supposed map reduced result count should be 8908.

With the line above, the returned result would be correctly returned as 8908.

But if I changed the line to:

total += 1;

The returned result would be only 909, about 1/9 of the supposed result.

Also that I tried print(value.count) and the printed result is 1.

What explains this behavior?


short answer : value.count is not always equal to one.

long answer : This is the expected behavior of map reduce : the reduce function is aggreagating the results of the map function. However, it does aggregate on the results of map function by small groups producing intermediate results (sub total in your case). Then reduce functions are runned again on these intermediate results as they were direct results of the map function. And so on until there is only one intermediate result left for each key, that's the final results.

It can be seen as a pyramid of intermediate results :

emit(...)-|
          |- reduce -> |
emit(...)-|            |
          |            |- reduce ->|
emit(...)-|            |           |
          |            |           |
emit(...)-|- reduce -> |           |
          |                        |-> reduce = final result
emit(...)-|                        |
                                   |
emit(...)--- reduce ------------  >|
                                   |
emit(...)-----------------reduce ->|

The number of reduce and their inputs is unpredicatable and is meant to remain hidden. That's why you have to give a reduce function which return data of the same type (same schema) as input.


The reduce function does not only get called on the original input data, but also on its own output, until there is a final result. So it needs to be able to handle these intermediate results, such as [{count: 5}, {count:3}, {count: 4}] coming out of an earlier stage.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜