mongo schema (embedding vs reference) [duplicate]

2023-04-12 09:53 问答作者：

This question already has answers here: MongoDB relationships: embed or reference? (10 answers) Closed 8 years ago.

Let's assume that I am designing a service like Foursquare that tracks user checkins based on on a user location. I am using MongoDB as the backend.

The premise here is that a user can check-in to a location, so collections in the schema might look like this:

db.places.find()
{ "_id" : ObjectId("4e6a5a58a43a59e451d69351"), "address" : { "street" : "2020 Lombard     St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.800274, -122.434914 ], "name" : "Marina Sushi", "timezone" : "America/Los_Angeles" }
{ "_id" : ObjectId("4e6a59c3a43a59e451d69350"), "address" : { "street" : "246 Kearny St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.79054, -122.40361 ], "name" : "Rickhouse", "timezone" : "America/Los_Angeles" }

db.users.find()
{ "_id" : ObjectId("4e936bc1da06d5e081544b8b"), "_class" 开发者_如何学运维: "com.gosociety.server.common.model.User", "email" : "goso@gosociety.com", "password" : "asdfasdf"}

So in the above collections, we have places and users. A user can "check-in" to a place, so when a user checks in, we'll keep a record of that in the database. A check-in would consist of: time of check-in(UTC), and note(150 characters), and whether it was sent to his Facebook feed or not (boolean).

Based on the description, I could think of two alternatives for schema design in Mongo:

Create a checkin collection, and use the mongo generated reference id to store that in the User collection, and the Places collection as a check-ins [] in each collection. This way it would be easy to determine aggregate statistics per user and per venue.
Dont' create a checkin collection, but update both the Place and User data with the same check-in information.

I believe I read in the mongo documentation that aggregation should directly be used if the data being aggregated is almost never displayed without the Object containing the aggregate info. If we follow the method that the foursquare app uses, it shows the users total check-ins only when we view their profile or place check-in stats when we view their place details.

Any suggestions here would be much appreciated.

Thanks.

Personally I would go with a separate collection, mainly for the purpose of keeping your user/place objects small, since you can have an unbounded # of checkins per user/place. If you put an index on user_id/timestampl and place_id/timestamp in your checkins collection, then queries for a particular user or place will be efficient. A second benefit to using a separate collection is that MongoDB won't have to keep moving your user or place object when it grows too large. Instead, it will just keep appending to the checkins collection, which should be quite efficient (10s of 1000s of inserts per second per shard).

I should also mention that I would not store the checkin IDs in either the place nor the user document, since you get the same performance benefit from having an index on place_id or user_id in the checkins document.

I agree with Rick, though you may want to store aggregate data about checkins in your places/users documents (e.g. totalCheckinCount) for quick retrieval.

This is safe with respect to the growth/movement issue that Rick highlighted, since simple aggregate data grows boundedly at O(1) unlike storing the actual checkins themselves, which would of course grow at O(n) where n is the number of checkins.

继续阅读：database mongodb

mongo schema (embedding vs reference) [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？