Entity Framework Performance Issue
I am running into an interesting performance issue with Entity Framework. I am using Code First.
Here is the structure of my entities:
A Book can have many Reviews. A Review is associated with a single Book. A Review can have one or many Comments. A Comment is associated with one Review.
public class Book
{
public int BookId { get; set; }
// ...
public ICollection<Review> Reviews { get; set; }
}
public class Review
开发者_高级运维{
public int ReviewId { get; set; }
public int BookId { get; set; }
public Book Book { get; set; }
public ICollection<Comment> Comments { get; set; }
}
public class Comment
{
public int CommentId { get; set; }
public int ReviewId { get; set; }
public Review Review { get; set; }
}
I populated my database with a lot of data and added the proper indexes. I am trying to retrieve a single book that has 10,000 reviews on it using this query:
var bookAndReviews = db.Books.Where(b => b.BookId == id)
.Include(b => b.Reviews)
.FirstOrDefault();
This particular book has 10,000 reviews. The performance of this query is around 4 seconds. Running the exact same query (via SQL Profiler) actually returns in no time at all. I used the same query and a SqlDataAdapter and custom objects to retrieve the data and it happens in under 500 milliseconds.
Using ANTS Performance Profiler it looks like a bulk of the time is being spent doing a few different things:
The Equals method is being called 50 million times.
Does anyone know why it would need to call this 50 million times and how I could increase the performance for this?
Why is Equals called 50M times?
It sounds quite suspicious. You have 10.000 reviews and 50.000.000 calls to Equals
. Suppose that this is caused by identity map internally implemented by EF. Identity map ensures that each entity with unique key is tracked by the context only once so if context already has instance with the same key as loaded record from the database it will not materialize new instance and instead uses the existing one. Now how this can coincide with those numbers? My terrifying guess:
=============================================
1st record read | 0 comparisons
2nd record read | 1 comparison
3rd record read | 2 comparisons
...
10.000th record read | 9.999 comparisons
That means that each new record is compared with every existing record in identity map. By applying math to compute sum of all comparison we can use something called "Arithmetic sequence":
a(n) = a(n-1) + 1
Sum(n) = (n / 2) * (a(1) + a(n))
Sum(10.000) = 5.000 * (0 + 9.999) => 5.000 * 10.000 = 50.000.000
I hope I didn't make mistake in my assumptions or calculation. Wait! I hope I did mistake because this doesn't seem good.
Try turning off change tracking = hopefully turning off identity map checking.
It can be tricky. Start with:
var bookAndReviews = db.Books.Where(b => b.BookId == id)
.Include(b => b.Reviews)
.AsNoTracking()
.FirstOrDefault();
But there is a big chance that your navigation property will not be populated (because it is handled by change tracking). In such case use this approach:
var book = db.Books.Where(b => b.BookId == id).AsNoTracking().FirstOrDefault();
book.Reviews = db.Reviews.Where(r => r.BookId == id).AsNoTracking().ToList();
Anyway can you see what object type is passed to Equals? I think it should compare only primary keys and even 50M integer comparisons should not be such a problem.
As a side note EF is slow - it is well known fact. It also uses reflection internally when materializing entities so simply 10.000 records can take "some time". Unless you already did that you can also turn off dynamic proxy creation (db.Configuration.ProxyCreationEnabled
).
I know this sounds lame, but have you tried the other way around, e.g.:
var reviewsAndBooks = db.Reviews.Where(r => r.Book.BookId == id)
.Include(r => r.Book);
I have noticed sometimes better performance from EF when you approach your queries this way (but I haven't had the time to figure out why).
精彩评论