开发者

How do I pass parameters to Mongodb map/reduce in Java?

I have some data like this:

{id: 1, text: "This is a sentence about dogs", indices: ["sentence", "dogs"]}
{id: 2, text: "This sentence is about cats and dogs", indices: ["sentence", "cats", "dogs"]}

Where I have manually extracted key terms from the text and stored them as indices. I want to be able to do a search and order the results with the most matching indices. So for this example, I would like to be able to pass "cats" and "dogs" and get both objects returned, but id=2 should be first with score=2.

I first tried to use the DBCollection.group function

{public DBObject group(DBObject key, DBObject cond, DBObject initial, String reduce, String finalize)}

But I don't see a way to send parameters. I tried:

key: {id: true},
cond: {"indices" $in ['cats', 'dogs']},
initial: {score: 0} 
reduc开发者_StackOverflowe: function(doc, out){ out.score++; }

but obviously this will just return a count of 1 for each of the 2 objects.

I realised that I could send the keyword parameters as part of the initial config of the reduced object.

final List<String> targetTerms = Arrays.asList("dogs", "cats");
final Datastore ds = ….
final DBCollection coll = ds.getCollection(Example.class);
BasicDBObject key = new BasicDBObject("_id", true);
BasicDBObject cond = new BasicDBObject();
cond.append("indices", new BasicDBObject("$in", targetTerms));
BasicDBObject initial = new BasicDBObject();
initial.append("score", 0);
initial.append("targetTerms", targetTerms);
String reduce = "function (obj, prev) { " +
        "  for (i in prev.targetTerms) {" +
        "    targetTerm = prev.targetTerms[i];"+
        "      for (j in obj.indices) {" +
        "        var index = obj.indices[j];"+
        "        if (targetTerm === index) prev.score++;" +
        "    }" +
        "  }" +
        "}";
String fn = null;
final BasicDBList group = (BasicDBList) coll.group(key, cond, initial, reduce, fn);

I get results like this:

{ "_id" : { "$oid" : "4dcfe16c05a063bb07ccbb7b"} , "score" : 1.0 , "targetTerms" : [ "virtual" , "library"]}
{ "_id" : { "$oid" : "4dcfe17d05a063bb07ccbb83"} , "score" : 2.0 , "targetTerms" : [ "virtual" , "library"]}

This got me the score values that I wanted, and I am able to narrow down the entries to be processed with more specific conditional rules.

So I have a few questions:

  1. Is this a good way to send "parameters" to the group action's reduce function?
  2. Is there a way to sort (and perhaps limit) the output inside mongodb before returning to the client?
  3. Will this break on sharded Mongodb instances?
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜