How do I pass parameters to Mongodb map/reduce in Java?
I have some data like this:
{id: 1, text: "This is a sentence about dogs", indices: ["sentence", "dogs"]}
{id: 2, text: "This sentence is about cats and dogs", indices: ["sentence", "cats", "dogs"]}
Where I have manually extracted key terms from the text and stored them as indices. I want to be able to do a search and order the results with the most matching indices. So for this example, I would like to be able to pass "cats" and "dogs" and get both objects returned, but id=2 should be first with score=2.
I first tried to use the DBCollection.group function
{public DBObject group(DBObject key,
DBObject cond,
DBObject initial,
String reduce,
String finalize)
}
But I don't see a way to send parameters. I tried:
key: {id: true},
cond: {"indices" $in ['cats', 'dogs']},
initial: {score: 0}
reduc开发者_StackOverflowe: function(doc, out){ out.score++; }
but obviously this will just return a count of 1 for each of the 2 objects.
I realised that I could send the keyword parameters as part of the initial config of the reduced object.
final List<String> targetTerms = Arrays.asList("dogs", "cats");
final Datastore ds = ….
final DBCollection coll = ds.getCollection(Example.class);
BasicDBObject key = new BasicDBObject("_id", true);
BasicDBObject cond = new BasicDBObject();
cond.append("indices", new BasicDBObject("$in", targetTerms));
BasicDBObject initial = new BasicDBObject();
initial.append("score", 0);
initial.append("targetTerms", targetTerms);
String reduce = "function (obj, prev) { " +
" for (i in prev.targetTerms) {" +
" targetTerm = prev.targetTerms[i];"+
" for (j in obj.indices) {" +
" var index = obj.indices[j];"+
" if (targetTerm === index) prev.score++;" +
" }" +
" }" +
"}";
String fn = null;
final BasicDBList group = (BasicDBList) coll.group(key, cond, initial, reduce, fn);
I get results like this:
{ "_id" : { "$oid" : "4dcfe16c05a063bb07ccbb7b"} , "score" : 1.0 , "targetTerms" : [ "virtual" , "library"]}
{ "_id" : { "$oid" : "4dcfe17d05a063bb07ccbb83"} , "score" : 2.0 , "targetTerms" : [ "virtual" , "library"]}
This got me the score values that I wanted, and I am able to narrow down the entries to be processed with more specific conditional rules.
So I have a few questions:
- Is this a good way to send "parameters" to the group action's reduce function?
- Is there a way to sort (and perhaps limit) the output inside mongodb before returning to the client?
- Will this break on sharded Mongodb instances?
精彩评论