开发者

How can I reduce Mongo db by averaging out old data

I have a mongodb for measurements which has a document per measurements. Each doc looks like:

{
 timestamp : 123
 value     : 123
 meta1     : something
 meta2     : something
}

I get measurements from a number of sources every second, and so the db gets quite large, quickly. I'm interested in keeping the recent information at the frequency it was read in, but older data, i would like to average out periodically to save space, and make the db a bit quicker.

1.Wha开发者_Go百科ts the best approach in mongo?

2.Is there a better db for this, considering that the schema is different for different measurements, and a fixed format wouldn't work very well. RRD is also not an option as i need the dynamic query abilities.?


1. Whats the best approach in mongo?
Use capped collections for use cases such as logging. Another approach is to create a 'background process' that will be move old data from collection.

2.Is there a better db for this, considering that the schema is different for different measurements, and a fixed format wouldn't work very well. RRD is also not an option as i need the dynamic query abilities.?
Mongodb is a good fit here.

Update: Another approch is to store each data item twice: First in capped collection(and use this collection for quering). And create another collection(or even another logdb) just for logging your events.


Thanks for the input.

I think I'm going to try out using buckets for different timeframes. So, i'll create 3 stores corresponding to say 1sec, 1min, 15min, and then manage the aggregation through a manual job running every so often which will compact/average out the values, delete of stuff that's not needed, etc...


  1. I'm not sure about the best approach but a simple one would be to have a cron job that would remove all the documents older than a given timestamp (your_time = now - some_time).

    db.docs.remove({ timestamp : {'$lte' : your_time}})

  2. Given that you need a schemaless database that allows you to perform dynamic queries, mondogb seems to be a good fit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜