MongoDB Many-to-Many Association

2022-12-21 11:25 问答作者：

How would you do a many-to-many association with MongoDB?

For example; let's say you have a Users开发者_运维知识库 table and a Roles table. Users have many roles, and roles have many users. In SQL land you would create a UserRoles table.

Users:
    Id
    Name

Roles:
    Id
    Name

UserRoles:
    UserId
    RoleId

How is same sort of relationship handled in MongoDB?

Depending on your query needs you can put everything in the user document:

{name:"Joe"
,roles:["Admin","User","Engineer"]
}

To get all the Engineers, use:

db.things.find( { roles : "Engineer" } );

If you want to maintain the roles in separate documents then you can include the document's _id in the roles array instead of the name:

{name:"Joe"
,roles:["4b5783300334000000000aa9","5783300334000000000aa943","6c6793300334001000000006"]
}

and set up the roles like:

{_id:"6c6793300334001000000006"
,rolename:"Engineer"
}

Instead of trying to model according to our years of experience with RDBMS's, I have found it much easier to model document-repository solutions using MongoDB, Redis, and other NoSQL data stores by optimizing for the read use cases, while being considerate of the atomic write operations that need to be supported by the write use cases.

For instance, the uses of a "Users in Roles" domain follow:

Role - Create, Read, Update, Delete, List Users, Add User, Remove User, Clear All Users, Index of User or similar to support "Is User In Role" (operations like a container + its own metadata).
User - Create, Read, Update, Delete (CRUD operations like a free-standing entity)

This can be modeled as the following document templates:

User: { _id: UniqueId, name: string, roles: string[] }
    Indexes: unique: [ name ]
Role: { _id: UniqueId, name: string, users: string[] }
    Indexes: unique: [ name ]

To support the high frequency uses, such as Role-related features from the User entity, User.Roles is intentionally denormalized, stored on the User as well as Role.Users having duplicate storage.

If it is not readily apparent in the text, but this is the type of thinking that is encouraged when using document repositories.

I hope that this helps bridge the gap with regard to the read side of the operations.

For the write side, what is encouraged is to model according to atomic writes. For instance, if the document structures require acquiring a lock, updating one document, then another, and possibly more documents, then releasing the lock, likely the model has failed. Just because we can build distributed locks doesn't mean that we are supposed to use them.

For the case of the User in Roles model, the operations that stretch our atomic write avoidance of locks is adding or removing a User from a Role. In either case, a successful operation results in both a single User and a single Role document being updated. If something fails, it is easy to perform cleanup. This is the one reason the Unit of Work pattern comes up quite a lot where document repositories are used.

The operation that really stretches our atomic write avoidance of locks is clearing a Role, which would result in many User updates to remove the Role.name from the User.roles array. This operation of clear then is generally discouraged, but if needed can be implemented by ordering the operations:

Get the list of user names from Role.users.
Iterate the user names from step 1, remove the role name from User.roles.
Clear the Role.users.

In the case of an issue, which is most likely to occur within step 2, a rollback is easy as the same set of user names from step 1 can be used to recover or continue.

I've just stumbled upon this question and, although it's an old one, I thought it would be useful to add a couple of possibilities not mentioned in the answers given. Also, things have moved on a bit in the last few years, so it is worth emphasising that SQL and NoSQL are moving closer to each other.

One of the commenters brought up the wise cautionary attitude that “if data is relational, use relational”. However, that comment only makes sense in the relational world, where schemas always come before the application.

RELATIONAL WORLD: Structure data > Write application to get it
NOSQL WORLD: Design application > Structure data accordingly

Even if data is relational, NoSQL is still an option. For example, one-to-many relationships are no problem at all and are widely covered in MongoDB docs

A 2015 SOLUTION TO A 2010 PROBLEM

Since this question was posted, there have been serious attempts at bringing noSQL closer to SQL. The team led by Yannis Papakonstantinou at the University of California (San Diego) have been working on FORWARD, an implementation of SQL++ which could soon be the solution to persistent problems like the one posted here.

At a more practical level, the release of Couchbase 4.0 has meant that, for the first time, you can do native JOINs in NoSQL. They use their own N1QL. This is an example of a JOIN from their tutorials:

SELECT usr.personal_details, orders 
        FROM users_with_orders usr 
            USE KEYS "Elinor_33313792" 
                JOIN orders_with_users orders 
                    ON KEYS ARRAY s.order_id FOR s IN usr.shipped_order_history END

N1QL allows for most if not all SQL operations including aggregration, filtering, etc.

THE NOT-SO-NEW HYBRID SOLUTION

If MongoDB is still the only option, then I'd like to go back to my point that the application should take precedence over the structure of data. None of the answers mention hybrid embedding, whereby most queried data is embedded in the document/object, and references are kept for a minority of cases.

Example: can information (other than role name) wait? could bootstrapping the application be faster by not requesting anything that the user doesn't need yet?

This could be the case if user logs in and s/he needs to see all the options for all the roles s/he belongs to. However, the user is an “Engineer” and options for this role are rarely used. This means the application only needs to show the options for an engineer in case s/he wants to click on them.

This can be achieved with a document which tells the application at the start (1) which roles the user belongs to and (2) where to get information about an event linked to a particular role.

   {_id: ObjectID(),
    roles: [[“Engineer”, “ObjectId()”],
            [“Administrator”, “ObjectId()”]]
   }

Or, even better, index the role.name field in the roles collection, and you may not need to embed ObjectID() either.

Another example: is information about ALL the roles requested ALL the time?

It could also be the case that the user logs in to the dashboard and 90% of the time performs tasks linked to the “Engineer” role. Hybrid embedding could be done for that particular role in full and keep references for the rest only.

{_id: ObjectID(),
  roles: [{name: “Engineer”, 
           property1: value1,
           property2: value2
          },   
          [“Administrator”, “ObjectId()”]
         ]
}

Being schemaless is not just a characteristic of NoSQL, it could be an advantage in this case. It's perfectly valid to nest different types of objects in the “Roles” property of an user object.

There are two approaches can be used:

1st approach

Add reference link into user document roles list (array):

{
  '_id': ObjectId('312xczc324vdfd4353ds4r32')
  user:faizanfareed,
  roles : [
           {'roleName':'admin', # remove this because when we will be updating some roles name we also need to be update in each user document. If not then ignore this.
             roleId: ObjectID('casd324vfdg65765745435v')
          },
            {'roleName':'engineer',
           roleId: ObjectID('casd324vfdvxcv7454rtr35vvvvbre')
           },
          ]
}

And (Base on requirements for queries) we can also add user reference id into role document users list (array):

{
  roleName:admin,
  users : [{userId: ObjectId('312xczc324vdfd4353ds4r32')}, .......]
}

But adding users id into role document size will be exceeded 16MB which is not good at all. We can use this approach if role document size not exceeded and size of users is bounded. If not required we can add roles id into user docs only.

2nd approach which is traditional

Create new collection in which each document contains id's of both user and role.

{
  '_id': ObjectId('mnvctcyu8678hjygtuyoe')
  userId: ObjectId('312xczc324vdfd4353ds4r32')
  roleId: ObjectID('casd324vfdg65765745435v')
            
}

Document size will not be exceeded but read operation is not easy in this approach.

Base on requirements go with 1st or 2nd approach.

Final comments on this : Go with 1st approach and add only roleId into user document array because no of roles will not be greater-than users. User document size will not be exceeded 16MB.

in case when employee and company is entity-object try to use following schema:

employee{
   //put your contract to employee
   contracts:{ item1, item2, item3,...}
}

company{
   //and duplicate it in company
   contracts:{ item1, item2, item3,...}
}

继续阅读：associations many-to-many mongodb

MongoDB Many-to-Many Association

A 2015 SOLUTION TO A 2010 PROBLEM

THE NOT-SO-NEW HYBRID SOLUTION

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

A 2015 SOLUTION TO A 2010 PROBLEM

THE NOT-SO-NEW HYBRID SOLUTION

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？