Atomicity in Map/Reducing over new records (MongoDB)

2023-03-17 00:03 问答作者：

Here's th开发者_如何学Ce situation: I've got a MongoDB cluster and a web-app, which does a pretty intensive Map/Reduce query. This query happens periodically (every 5min) in a cron job, and the results are stored (using $merge) into a collection.

What works: Currently, the query performs over every record in its collection. Said collection is slowly growing to be millions of rows, and each time it runs, it takes a little longer.

The obvious solution is to run the Map/Reduce over new records, and use the reduce function over the old stored values to calculate the correct value. MongoDB is great, it lets you specify a reduce option instead of merge to do just that.

What I can't figure out: How to correctly perform the M/R only over new records in the initial collection. I see two potential solutions, neither of which are good. Ideas?

I could flag records that have been processed. Problem is how to flag exactly the same records that I just M/R'd over?
I could query for the matching items, then pass the list of ids as an $in: [id1, id2, ...] query to the Map/Reduce, and then send an update to set my flag using the same $in. But that's really inelegant, and I don't know how that's going to perform when the list of records is huge.

tl;dr: How do I only select new records in a Map/Reduce query that reduces its result into a collection.

A kind soul on the #mongodb IRC channel helped me figure this one out. A simple solution is to have a state machine field, and do the following (in pseudo-code):

set {state:'processing'} where {state:{$exists:false}}
mapreduce {...} where {state:'processing'}
set {state:'done'} where {state:'processing'}

Now, this is suboptimal because it wastes a lot of disk space on a collection with millions of records. But the real question is, why did I not think of this sooner?

继续阅读：mapreduce mongodb php

Atomicity in Map/Reducing over new records (MongoDB)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？