开发者

Efficiently count percentage of occurrence in MongoDB

So, I'm tinkering with MongoDB, and I'm trying to get the count() aggregation queryi开发者_Go百科ng to scale properly, to allow me to easily calculate the percentage of occurrence of certain values in the document across the collection.

I have a document with a structure like:

{
    foo : 'bar',
    moo : 'cow',
    values : {
        alpha : true,
        beta : false,
        gamma : false,
        delta : true ... (many more)
    }
}

Now, I have several thousand of these documents, and I want to efficiently calculate the percentage of true (or the percentage of false) of all the values in the values object (and in my case, there are ~50). ie, what percentage of the time alpha is true, beta is true, etc.

I started naively with count(), but it seems like it only allows one query at a time, so that led me to do this (using the PHP Mongo class, but its basically just a regular count() function:

 $array_of_keys = array('alpha', 'beta', 'gamma', 'delta'...);
 for($i=0;$i<count($array_of_keys);$i++){
    $array_of_keys = [...]
    for($i=0;$i<count($array_of_keys);$i++){

$false  = intval($collection->count(array($array_of_keys[$i]=>false)));
$true  = intval($collection->count(array($array_of_keys[$i]=>true)));
}

But even with a very small number of records (around 100), this took 9 seconds.

What's the best approach for this?


Here is a simple MapReduce that will do what you want:

map = function() {
    for (var key in this.values){
        emit(key, {count:1, trues: (this.values[key] ? 1 : 0)});
    }
}

reduce = function(key, values){
    var out = values[0];
    for (var i=1; i < values.length; i++){
        out.count += values[i].count;
        out.trues += values[i].trues;
    }
    return out;
}

finalize = function(key, value){
    value.ratio = value.trues / value.count;
    return value;
}

db.runCommand({mapReduce:'collection',
               map:map,
               reduce:reduce,
               finalize:finalize,
               out:'counts'
               })

db.counts.findOne({_id:'alpha'})
{_id: 'alpha', value: {count: 100, trues: 52, ratio: 0.52}}

You could also do an upsert like this when you insert into your main collection which will give you a real-time view into your data:

for (var key in this.values){
    db.counts.update({_id:key},
                     {$inc:{count:1, trues: (this.values[key] ? 1 : 0)}},
                     true);
}

In fact, you could even combine these methods. Do a one-time MapReduce batch job to populate the counts collection and then use upserts to keep it up to date.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜