开发者

performance of LINQ queries against the SQL equivalent

开发者_如何转开发I’m currently having a debate with someone at work regarding the performance of LINQ queries against the SQL equivalent.

Has anyone done/seen any scientific testing on this?

If not, anecdotal evidence of where you’ve had to replace LINQ with a SQL query for performance reasons will help me make my case.


I have a slightly different take on this; when profiling (with our shiny profiler) we noticed that LINQ (to SQL in this case) was doing a reasonable job generating TSQL for basic queries, and the operation was running very fast at the DB server (0.5ms etc) - however, the actual query operation was taking MUCH longer (like 20ms+ for the same 0.5ms query, in some cases). So where was the time? You might think "query translation", but no; we also have a lot of ExecuteQuery<T> code (i.e. where you write the TSQL by hand) and this was doing exactly the same thing. It turned out that somewhere between the materializer and the identity map vast amounts of time was being lost.

So; we wrote our own materializer that was pretty-much a drop-in replacement for ExecuteQuery - and thus was born dapper.

On more of the LINQ side, it generally does OK at generating TSQL for simple queries, but for anything complex I usually trust hand-coded TSQL a lot more. To take a case as a sample, I had a complex query involving groups, skips and takes. It didn't perform well. When I wrote it by hand with ROW_NUMBER() etc the same results took 4% of the "stats IO" and total time.

My current opinion on LINQ is that the ORM tools make data mutation a breeze, but for query I tend to use dapper. Which is ironic since the Q in LINQ is "query".


LINQ as in LINQ2SQL or EF needs to have a generalized ruleset on how to convert the LINQ queries to SQL and with this introduces alevel of abstraction. This abstraction will sometimes result in statements that are less-than-optimal. Writing your SQL statements by hand allows you to tweak it for your specific case.
This leads to the conclusion that the speed of SQL is more likely to be faster, especially in complex scenarios. But this doesn't mean that every LINQ query is slower than its SQL equivalent.
My experience with EF 1 in combination with Oracle supports this.


One of the more obvious examples where SQL query is better is update/delete batching. There is no facility built into LINQ-To-SQL to handle multiple updates or deletes, except to process the records one by one, which generates individual queries for each record.

var myRecords = myContext.Books.Where(b => b.Author = "Bob");
foreach (var rec in myRecords)
    myContext.Books.DeleteOnSubmit(rec);
myContext.SubmitChanges()  // generates delete statement for each record.

In the general case, LINQ is optimized to generate very efficient queries, and can sometimes generate even better queries than the intuitive way to write it with SQL. However, there will always be exceptions where the LINQ-to-SQL statements will not be as efficient as a SQL query that can be written by hand.


LinqToSql is a layer on top of ADO.Net. So if you used ADO.Net with Sql queries you'll be "faster" than LinqToSql (by not doing those extra things that LinqToSql does).

This same argument (when performance is the only criteria) can be made against .Net and for Machine code. .Net IL is ultimately JIT'ed to Machine and if you just wrote Machine in the first place, you'd be faster.

Here's a couple scenarios where it matters:

  • If your queries are low IO with large amounts of data returned for materialization, LinqToSql will take much more time than a raw sql. On the other hand, you end up with Customer instances instead of DataRows.
  • If your queries are highly dynamic, such that you never issue the same sql text twice, you'll always incur query translation costs. Sql queries will be faster (esp if you don't count the construction of the sql text).

Here's a couple scenarios where it just doesn't matter:

  • If your queries are database-IO heavy (report style), you won't notice the extra client-side CPU cost of LinqToSql's abstraction.
  • If your queries are repetative (same query with different parameters), much of the cost of LinqToSql's abstraction can be mitigated by using CompiledQuery. <-- I think most systems fit this scenario.


Please check other question: LINQ-to-SQL vs stored procedures?

In my opinion, linq is good enough to use in almost every aspect. For some rare cases when you are very good at sql and really want to improve performance then go for raw sql instead.


When you run a LINQ query, LINQ generates T-SQL code to perform your query. You can run the SQL Server Profiler to get an idea of what the query looks like. I've done it in the past and the T-SQL generated looks pretty clean.

I can't speak directly of performance as I've not run any numbers, but one thing I have seen is that unless you use a rowversion column in your table for the LINQ queries, you'll end up with every column in the table being included in the WHERE clause, which is done that way as part of LINQ's built-in optimistic concurrency checking. Using the rowversion column allows LINQ to use only that column for this checking and makes for cleaner and more performant T-SQL.


Eearlier this year I submitted a bug report against the performance of Linq2Sql when inserting many records in a single transaction set. Each individual insert took progressively longer and longer to process because Linq2Sql had to check the relationships -- which the DB server was going to check anyway.

You can see the bug report at https://connect.microsoft.com/VisualStudio/feedback/details/637841/linq-to-sql-shows-progressively-worse-performance-as-the-transaction-set-grows

Click on show details and I included some performance numbers.

Interestingly enough, Microsoft's answer was that they weren't planning on improving Linq2Sql performance and that I should use ADO.Net which is what I had already decided to do.

My bottom line at this point is that I love Linq2Sql for queries but would not use it for inserts or updates. I also like the fact that Linq2Sql will populate the records from my foreign keys for me without coding separate queries.

As for complicated queries - I'm about 50% using Linq2Sql and about 50% stored procedures. I never do inline SQL because I prefer using methods that are strongly typed. So for simple queries, the simplicity of Linq2Sql beats ADO.Net 100% of the time.


Generally, in my experience the problem with EF4.1 or LINQtoSQL isn't LINQ, it's the coder.

Developers generally don't know how to code efficient queries or how to code efficient and scalable database access (i.e. they know VB or C# well, but not SQL) and this shows in how they construct a query. At that point, it doesn't matter if they're using LINQ or not.

Where this really shows up is when applications work fine in development (dev database has a few 100,000 rows of data) but perform badly in production (millions of rows of data). Developers immediately point out that it worked fine in development, so it must be something wrong with the database and not their code. This is how wars are started between developers and DBA's.

While I really like LINQ, you're often better off moving queries into Stored Procs so that you can tune and tweak your data access without having to do a full Dev-Test-QA-Release cycle on your application for every minor update.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜