开发者

Which is faster: filtering a DataSet in-memory or returning a result set from SQL Server?

I'm working on a pretty large table, (800k records and climbing) and I'd like to filter said table. The thing is, the table is stored in SQL Server. So, I was wondering, would a

SELECT * FROM table WHERE condition1=true

query be faster than loading the table to a typed DataSet and using DataRow.Find() then sending all 开发者_如何学Cthose to another DataTable?

I'm guessing yes, but I'll ask anyway.


As long as your SQL server is not paging because of RAM starvation the SQL Server should always be faster than loading the whole table via network and then filtering locally...


You should definitely do it in SQL Server because:

  1. That's one of the things a database server is designed to do and is good at. If the condition always involves the same columns then you might want to consider adding indexes to speed up the lookup even more.
  2. Performance wise it is always faster to load less data
  3. Your memory consumption is going to explode when you are trying to load the whole thing into memory. Especially when the table is growing unbound. It might work ok now but can kill you unexpectedly later.


The only exception is if you are going to be running this query over and over again. In that case, it might make sense to cache the data locally and then filter it ... but even then you should only do so if performance tests indicated a significant savings.


I don't think there is a catch all answer really, you should test it and measure the performance for your particular circumstances.

When doing multiple complicated queries, with joins across many tables etc, we have found that it can be significantly faster to execute one large query into a dataset and then filter it how we need in memory. A big part of the performance increase for us is because it avoid us doing multiple calls to the database, which can cause a bottle neck.

There are some SQL functions e.g. if you've got to do something like WHERE LocationId IN (1, 2, 3, 4, 5, 6) which performs like way worse in SQL compared to doing dataset.LocationIds.Where(id => myValues.Contains(id)) in C#, against large datasets, so in that situation if you measured it you should find in-memory performs best.

A SELECT with simple WHERE conditions would almost certainly be faster in SQL though. Which is why you should evaluate each scenario and test and measure if needed.


If your filter can be written as a stored procedure (you can use parameters for values in the select) it will definitely be faster because the first time you call the stored procedure its execution plan is calculated and stored. Next call, no wait for the plan to be calculated. (in your example however the plan calculation time is probably minuscule).


I get that this question is old but the answers are all missing an important point. A database can use the filter criteria, along with table indices, to load less data from the disk. For example, if you were to do SELECT * FROM X WHERE ID = 7 then (assuming ID is a primary key here) the database will only load a single row from the disk. If you filtered that data on the server side then the database would need to load the entire table off the disk.

Assuming you are I/O bound this is going to give you significantly better performance. Of course, not all filter criteria can work in this way. It depends on the database server, indices, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜