Which is faster: filtering a DataSet in-memory or returning a result set from SQL Server?

2023-03-21 06:00 问答作者：

I'm working on a pretty large table, (800k records and climbing) and I'd like to filter said table. The thing is, the table is stored in SQL Server. So, I was wondering, would a

SELECT * FROM table WHERE condition1=true

query be faster than loading the table to a typed DataSet and using DataRow.Find() then sending all 开发者_如何学Cthose to another DataTable?

I'm guessing yes, but I'll ask anyway.

As long as your SQL server is not paging because of RAM starvation the SQL Server should always be faster than loading the whole table via network and then filtering locally...

You should definitely do it in SQL Server because:

That's one of the things a database server is designed to do and is good at. If the condition always involves the same columns then you might want to consider adding indexes to speed up the lookup even more.
Performance wise it is always faster to load less data
Your memory consumption is going to explode when you are trying to load the whole thing into memory. Especially when the table is growing unbound. It might work ok now but can kill you unexpectedly later.

The only exception is if you are going to be running this query over and over again. In that case, it might make sense to cache the data locally and then filter it ... but even then you should only do so if performance tests indicated a significant savings.

I don't think there is a catch all answer really, you should test it and measure the performance for your particular circumstances.

When doing multiple complicated queries, with joins across many tables etc, we have found that it can be significantly faster to execute one large query into a dataset and then filter it how we need in memory. A big part of the performance increase for us is because it avoid us doing multiple calls to the database, which can cause a bottle neck.

There are some SQL functions e.g. if you've got to do something like WHERE LocationId IN (1, 2, 3, 4, 5, 6) which performs like way worse in SQL compared to doing dataset.LocationIds.Where(id => myValues.Contains(id)) in C#, against large datasets, so in that situation if you measured it you should find in-memory performs best.

A SELECT with simple WHERE conditions would almost certainly be faster in SQL though. Which is why you should evaluate each scenario and test and measure if needed.

If your filter can be written as a stored procedure (you can use parameters for values in the select) it will definitely be faster because the first time you call the stored procedure its execution plan is calculated and stored. Next call, no wait for the plan to be calculated. (in your example however the plan calculation time is probably minuscule).

I get that this question is old but the answers are all missing an important point. A database can use the filter criteria, along with table indices, to load less data from the disk. For example, if you were to do SELECT * FROM X WHERE ID = 7 then (assuming ID is a primary key here) the database will only load a single row from the disk. If you filtered that data on the server side then the database would need to load the entire table off the disk.

Assuming you are I/O bound this is going to give you significantly better performance. Of course, not all filter criteria can work in this way. It depends on the database server, indices, etc.

继续阅读：dataset sql-server

Which is faster: filtering a DataSet in-memory or returning a result set from SQL Server?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？