开发者

How to improve query performance with operators like $nin, $in for Mongodb

I have a reasonably large dataset of over 3 million documents that have tags similar to StackOverflow that uses tags for each question. The schema that I use for storing the tags is as follows:

{"id": 12345, "tags":["ta开发者_运维技巧g1", "tag2", "tag3"]}, {"id": 12346, "tags":["tag2", "tag3"]}

I have a multi-key index created on tags field. When I am performing queries using $in or $nin operators to find the intersection, union of the tags, the performance is around 7 seconds on a server class machine. Is there anything that I can do to improve the speed of query search?

EDIT 1:

Here is the explain plan as requested. What I observed is that the queries returned much faster after I restarted my server and just ran just the mongodb server. The queries performed much faster(< 50ms). I suspect the indexes were not cached in memory, although I had ample unused ram available and my index (800MB) could easily fit in memory.

db.tagsCollection.find( { "tags" : { $in : ['tag1', 'tag2'], $nin : ['tag4', ' tag5', 'tag6', 'tag7'] } } ).explain(); { "cursor" : "BtreeCursor tags_1 multi", "nscanned" : 6145193, "nscannedObjects" : 6145192, "n" : 969386, "millis" : 19640, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "tags" : [ [ "tag1", "tag1" ], [ "tag2", "tag2" ] ] } }

Note


This is what I thought of as an optimization ( though you might need to test it )

  1. Instead of storing tags,store a small key which identifies all the tags particular document has. say for post#125 the tags are : PHP, MongoDb , database .

    a) clean the tags like convert all of them to small case etc and then sort them alphabetically . current tags will be : database,mongodb,php

    b) Have a seperate collection which stores integer to tag mapping :

    { "_id" : 1 , "t" : "mongodb" }

    { "_id" : 2 , "t" : "php" } and so on store all the possible tags for your website

    c) to store a document, create the tag key using tags to number map from previous collection. so curent database,mongodb,php will become something like 1-12-2

    d) store your document like :

    { "id" : 12345 , "tags" : [1,12,3] }

QUERYING : The use of integers instead of strings on an indexed field would reduce the index size by great extent, and also make querying faster as compared to a string index. Not sure about amount of performance gain, but still worth a try to compare to your current implementation.


Check the size of your multi-key tags index using db.col.stats(). If it doesn't fit in RAM then you might be disk-bound and incurring some disk IO cost. If the index fits entirely in memory then I'm not sure what else you can do, apart from throw more hardware at it, unless you can optimise the queries themselves.

Do you need to search through all the data, or can you query a subset that's filtered by another indexed field? Or can you eliminate the $nin queries, which will tend to be slower because the have to iterate every tag, where as $in only has to iterate until it finds a match.


If you want performance to be super fast and dont have space contraints, I would suggest to have separate collection of tags with video id array and have an index on tag name.


Here is another suggestion but I've had not a chance to test it.

{ 
    tags:{ 
        items:[ 'a', 'b', 'c' ],
        mixed:{
            a:1, // hash value for a tag
            b:2, // hash value for b tag
            c:3  // hash value for c tag
        }
    }
}

and search query is

db.demo.find({ 'tags.mixed.a':1, 'tags.mixed.b':2  })

if possible have to create compound index for tags.mixed

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜