Finding changes in MongoDB database

2023-01-03 00:24 问答作者：

I'm designing a MongoDB database that works with a script that periodically polls a resource and gets back a response which is stored in the database. Right now my database has one collection with four fields , id, name, timestamp and data.

I need to be able to find out which names had changes in the data field between script runs, and which did not. In pseudocode,

if(data[name][timestamp]==data[name][timestamp+1]) //data has not changed
store data in collection 1
else //data has changed between script runs for this name
store data in collection 2

Is there a query that can do this without iterating and running javascript over each item in the collection? There are millions of documents, so this would be pretty slow.

Should I create a new collection named timestamp for every time the script runs? Would 开发者_JS百科that make it faster/more organized? Is there a better schema that could be used?

The script runs once a day so I won't run into a namespace limitation any time soon.

OK, this is a neat question b/c the short is basically: you will have to iterate and run javascript over each item.

The part where this gets "neat" is that this isn't really different from what an SQL solution would have to do. I mean, you're basically joining a table to itself where x.1=x.1 and y.1=y.2. Even if the relational DB can handle such a beast, it's definitely not going to be fast with millions of entries.

So the truth is, you're doing this right way. Here are the extra details I would use to make this cleaner.

Ensure that you have an index on Name/Timestamp.
Run a db.mycollection.find().foreach() across the data set.
Foreach entry you're going to a) Perform comparison. b) Save appropriately. c) Update a flag indicating that this record has been processed.
On future loads you should be able to add a query to your find. db.mycollection.find({flag:{$exists:false}}).foreach()
Use db.eval() to help with speed.

The reason for the "Name/Timestamp" index is that you're going to be looking up each "successor" by "Name/Timestamp", so you want to be quick here.

The reason for the "processed" flag is that you should never have to re-run the same item. If given timestamp 'n' you find 'n+1', then that's the only 'n+1' you're going to have.

Honestly, if you're only running this once / day, it's quite likely that the speed will be just fine, especially if you only running on new records. Just assume that it's going to take several minutes.

继续阅读：database-design javascript mongodb

Finding changes in MongoDB database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？