mongo schema (embedding vs reference) [duplicate]
Let's assume that I am designing a service like Foursquare that tracks user checkins based on on a user location. I am using MongoDB as the backend.
The premise here is that a user can check-in to a location, so collections in the schema might look like this:
db.places.find()
{ "_id" : ObjectId("4e6a5a58a43a59e451d69351"), "address" : { "street" : "2020 Lombard St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.800274, -122.434914 ], "name" : "Marina Sushi", "timezone" : "America/Los_Angeles" }
{ "_id" : ObjectId("4e6a59c3a43a59e451d69350"), "address" : { "street" : "246 Kearny St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.79054, -122.40361 ], "name" : "Rickhouse", "timezone" : "America/Los_Angeles" }
db.users.find()
{ "_id" : ObjectId("4e936bc1da06d5e081544b8b"), "_class" 开发者_如何学运维: "com.gosociety.server.common.model.User", "email" : "goso@gosociety.com", "password" : "asdfasdf"}
So in the above collections, we have places and users. A user can "check-in" to a place, so when a user checks in, we'll keep a record of that in the database. A check-in would consist of: time of check-in(UTC), and note(150 characters), and whether it was sent to his Facebook feed or not (boolean).
Based on the description, I could think of two alternatives for schema design in Mongo:
Create a checkin collection, and use the mongo generated reference id to store that in the User collection, and the Places collection as a check-ins [] in each collection. This way it would be easy to determine aggregate statistics per user and per venue.
Dont' create a checkin collection, but update both the Place and User data with the same check-in information.
I believe I read in the mongo documentation that aggregation should directly be used if the data being aggregated is almost never displayed without the Object containing the aggregate info. If we follow the method that the foursquare app uses, it shows the users total check-ins only when we view their profile or place check-in stats when we view their place details.
Any suggestions here would be much appreciated.
Thanks.
Personally I would go with a separate collection, mainly for the purpose of keeping your user/place objects small, since you can have an unbounded # of checkins per user/place. If you put an index on user_id/timestampl and place_id/timestamp in your checkins collection, then queries for a particular user or place will be efficient. A second benefit to using a separate collection is that MongoDB won't have to keep moving your user or place object when it grows too large. Instead, it will just keep appending to the checkins collection, which should be quite efficient (10s of 1000s of inserts per second per shard).
I should also mention that I would not store the checkin IDs in either the place nor the user document, since you get the same performance benefit from having an index on place_id or user_id in the checkins document.
I agree with Rick, though you may want to store aggregate data about checkins in your places/users documents (e.g. totalCheckinCount) for quick retrieval.
This is safe with respect to the growth/movement issue that Rick highlighted, since simple aggregate data grows boundedly at O(1) unlike storing the actual checkins themselves, which would of course grow at O(n) where n is the number of checkins.
精彩评论