开发者

O/R mapping: Single complex query vs. multiple simple queries

I'm qurious on how the result set of an SQL query is transported from the server to the client.

Most O/R mappers support both eager and lazy load, both have their pros and cons. e.g. Entity Framework4 (.NET) has wonderful eager load support.

However, lets assume we have a model like this:

BlogPost
{
    public string Body {get;set;}
    ICollection<Comment> Comments {get;set;}
}

...

and a query like this:

var posts = context
          .Posts
          .Include(post => post.Comments)
          .Where(post => post.Id == 1)
          .First();

This will result in a single SQL query, where all the data for the "Post" is repeated on each row for every "Comment"

Lets say we have 100 comments on a specific post and the Post.Body is a massive peice of text. this can't be good? Or is the data somehow compressed when sent to the client, thus minimizing the overhead of repeating data on each row?

What is the best way to determine if one such query is more efficient than just two simple queries (one for getting the post and one for getting its comments)?

Benchmarking this on a dev environment is pretty pointless, there are multiple factors here: CPU load on the SQL server Network load CPU load on the app server (materializing objects)

Ideas on this?

[Edit] Clarification:

Two queries would be something like this:

sql

select * from post where postid = 123

result

id , topic, body , etc...

sql

select * from comment where postid = 123

result

id,postid, commenttext , etc...

the first query would yield one row and the 2nd query would yield as many rows as there are comments.

with a single query there would be as many rows as there are comments for the specific post , but with all the post data repeated on each row.

result

p.id , p.topic, __p.body__, c.id, c.postid, c.commenttext

p.body would be repeated on each row, thus making the result set extremely large. (assuming that p.body contains alot of data tha开发者_JAVA技巧t is ;-)


I think it really comes down to the following:

  • How many posts are there?
  • How complex is it to get the comments of a post?

If you have several million posts, it will be better to use a single query, even if you have several comments for each post, because the aggregated roundtrip time will be much worse than the time for the transfer of the additional data.
So, I think you need to have a sharp eye ;-)
And also, I think that benchmarking in the dev environment is not pointless, because it can give at least relations between the two ways of doing it.


Having a single query that returns a lot of rows is almost always faster than a lot of queries returning just a single row.

In your case though, retrieving the user first, and then all comments (with a single query) is probably more efficient than getting everything in one query.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜