Efficiently count percentage of occurrence in MongoDB
So, I'm tinkering with MongoDB, and I'm trying to get the count()
aggregation queryi开发者_Go百科ng to scale properly, to allow me to easily calculate the percentage of occurrence of certain values in the document across the collection.
I have a document with a structure like:
{
foo : 'bar',
moo : 'cow',
values : {
alpha : true,
beta : false,
gamma : false,
delta : true ... (many more)
}
}
Now, I have several thousand of these documents, and I want to efficiently calculate the percentage of true (or the percentage of false) of all the values in the values
object (and in my case, there are ~50). ie, what percentage of the time alpha is true, beta is true, etc.
I started naively with count()
, but it seems like it only allows one query at a time, so that led me to do this (using the PHP Mongo class, but its basically just a regular count()
function:
$array_of_keys = array('alpha', 'beta', 'gamma', 'delta'...);
for($i=0;$i<count($array_of_keys);$i++){
$array_of_keys = [...]
for($i=0;$i<count($array_of_keys);$i++){
$false = intval($collection->count(array($array_of_keys[$i]=>false)));
$true = intval($collection->count(array($array_of_keys[$i]=>true)));
}
But even with a very small number of records (around 100), this took 9 seconds.
What's the best approach for this?
Here is a simple MapReduce that will do what you want:
map = function() {
for (var key in this.values){
emit(key, {count:1, trues: (this.values[key] ? 1 : 0)});
}
}
reduce = function(key, values){
var out = values[0];
for (var i=1; i < values.length; i++){
out.count += values[i].count;
out.trues += values[i].trues;
}
return out;
}
finalize = function(key, value){
value.ratio = value.trues / value.count;
return value;
}
db.runCommand({mapReduce:'collection',
map:map,
reduce:reduce,
finalize:finalize,
out:'counts'
})
db.counts.findOne({_id:'alpha'})
{_id: 'alpha', value: {count: 100, trues: 52, ratio: 0.52}}
You could also do an upsert like this when you insert into your main collection which will give you a real-time view into your data:
for (var key in this.values){
db.counts.update({_id:key},
{$inc:{count:1, trues: (this.values[key] ? 1 : 0)}},
true);
}
In fact, you could even combine these methods. Do a one-time MapReduce batch job to populate the counts collection and then use upserts to keep it up to date.
精彩评论