开发者

How can I avoid duplicating data in a document database like RavenDB?

Given that document databases, such as RavenDB, are non-relational, how do you avoid duplicating data that multiple documents have in common? How do you mainta开发者_开发问答in that data if it's okay to duplicate it?


With a document database you have to duplicate your data to some degree. What that degree is will depend on your system and use cases.

For example if we have a simple blog and user aggregates we could set them up as:

  public class User 
  {
    public string Id { get; set; }
    public string Name  { get; set; }
    public string Username  { get; set; }
    public string Password  { get; set; }
  }

  public class Blog
  {
     public string Id  { get; set; }
     public string Title  { get; set; }

     public class BlogUser
     {
       public string Id  { get; set; }
       public string Name  { get; set; }
     }
  }

In this example I have nested a BlogUser class inside the Blog class with the Id and Name properties of the User Aggregate associated with the Blog. I have included these fields as they are the only fields the Blog class is interested in, it doesn't need to know the users username or password when the blog is being displayed.

These nested classes are going to dependant on your systems use cases, so you have to design them carefully, but the general idea is to try and design Aggregates which can be loaded from the database with a single read and they will contain all the data required to display or manipulate them.

This then leads to the question of what happens when the User.Name gets updated.

With most document databases you would have to load all the instances of Blog which belong to the updated User and update the Blog.BlogUser.Name field and save them all back to the database.

Raven is slightly different as it support set functions for updates, so you are able to run a single update against RavenDB which will up date the BlogUser.Name property of the users blogs without you have to load them and update them all individually.

The code for doing the update within RavenDB (the manual way) for all the blog's would be:

  public void UpdateBlogUser(User user)
  {
    var blogs = session.Query<Blog>("blogsByUserId")
                  .Where(b.BlogUser.Id == user.Id)
                  .ToList();

    foreach(var blog in blogs)
       blog.BlogUser.Name == user.Name;

    session.SaveChanges()
  }

I've added in the SaveChanges just as an example. The RavenDB Client uses the Unit of Work pattern and so this should really happen somewhere outside of this method.


There's no one "right" answer to your question IMHO. It truly depends on how mutable the data you're duplicating is.

Take a look at the RavenDB documentation for lots of answers about document DB design vs. relational, but specifically check out the "Associations Management" section of the Document Structure Design Considerations document. In short, document DBs use the concepts of reference by IDs when they don't want to embed shared data in a document. These IDs are not like FKs, they are entirely up to the application to ensure the integrity of and resolve.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜