开发者

Querying documents containing two tags with CouchDB?

Consider the following documents in a CouchDB:

{
  "name":"Foo1",
  "tags":["tag1", "tag2", "tag3"],
  "otherTags":["otherTag1", "otherTag2"]
}

{
  "name":"Foo2",
  "tags":["tag2", "tag3", "tag4"],
  "otherTags":["otherTag2", "otherTag3"]
}

{
  "name":"Foo3",
  "tags":["tag3", "tag4", "tag5"],
  "otherTags":["otherTag3", "otherTag4"]
}

I'd like to query all documents that contain ALL (not any!) tags given as the key.

For example, if I request using '["tag2", "tag3"]' I'd like to retrieve Foo1 and Foo2.

I'm currently doing this by querying by tag, first for "tag2", then for "tag3", creating the union manually afterwards.

This seems to be awfully inefficient and I assume that there must be a better way.

My second question - but they are quite related, I think - would be:

How would I query for all documents that co开发者_如何学JAVAntain "tag2" AND "tag3" AND "otherTag3"?

I hope a question like this hasn't been asked/answered before. I searched for it and didn't find one.


Do you have a maximum number of?

  • Tags per document, and
  • Tags allowed in the query

If so, you have an upper-bound on the maximum number of tags to be indexed. For example, with a maximum of 5 tags per document, and 5 tags allowed in the AND query, you could simply output every 1, 2, 3, 4, and 5-tag combination into your index, for a maximum of 1 (five-tag combos + 5 (four-tag combos) + 10 (three-tag combos) + 10 (two-tag combos) + 5 (one-tag combos) = 31 rows in the view for that document.

That may be acceptable to you, considering that it's quite a powerful query. The disk usage may be acceptable (especially if you simply emit(tags, {_id: doc._id}) to minimize data in the view, and you can use ?include_docs=true to get the full document later. The final thing to remember is to always emit the key array sorted, and always query it the same way, because you are emitting only tag combinations, not permutations.

That can get you so far, however it does not scale up indefinitely. For full-blown arbitrary AND queries, you will indeed be required to split into multiple queries, or else look into CouchDB-Lucene.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜