SQL Server and SqlDataReader - Trillion Records - Memory

2022-12-13 18:07 问答作者：

I've never tried this - so I don't know if I'd run into memory issues.

But can a SqlDataReader read a trillion records? It's all streamed correct? I'm a little green to what the SQL/TDS protocol is doing under the covers.

UPDATE Translate Trillion to mean very large number. I probably should have开发者_如何学Go said something like 1 billion or 100 million.

There are a few details.

SqlDataReader will normally read an entire row in memory and cache it. This includes any BLOB fields, so you can end up caching several 2GB fields in memory (XML, VARBINARY(MAX), VARCHAR(MAX), NVARCHAR(MAX)). If such fields are a concern then you must pass in the CommandBehavior.SequentialAccess to ExecuteReader and use the streaming capabilities of the SqlClient specific types like SqlBytes.Stream.
A connection is busy until the SqlDataReader completes. This creates transactional problems because you won't be able to to any processing in the database in the same transaciton, because the connection is busy. Trying to open a different conneciton and enroll in the same transaction will fail, as loop back distributed transacitons are prohibited. The loution is to use MARS. You do so by setting MultipleActiveResultSets=True on the connection. This allows you to issue command on the same connection while a data reader is still active (typical fetch-process-fetch loop). Read the link to Christian Kleinerman's with great care, make sure you understand the issues and restrictions around MARS and transactions, they're quite subtle and counter intuitive.
Lengthy processing in the client will block the server. Your query will still be executing all this time and the server will have to suspend it when the communication pipe fills up. A query consumes a worker (or more if it has parallel plans) and workes are a very scarce commodity in a server (they equate roughly to threads). You won't be bale to afford many clients processing huge result sets at their own leissure.
Transaction size. Processing a trillion records on one transaction is never going to work. The log will have to grow to accomodate the entire transaction and won't truncate and reuse the VLFs, resulting in huge log growth.
Recovery time. If processing fails at the 999 billionth record it will have to rollback all the work done, so it will take another '12' days just to rollback.

Yes, that will stream... but I don't think you should actually try to do it.

If you could read a million records per second (which sounds unlikely to me) you'd still need 12 days to read a trillion records... that's a lot of work to risk losing half way through.

Now I realise you probably don't really want to read a trillion records, literally, but my point is that if you can separate your "large amount" of work into logical batches anyway, that's probably a good idea.

Yes - it might take a while (as long as your SQL isn't doing anything silly trying to take a snapshot or anything), but if your server can stream it out, the SqlDataReader shouldn't have a memory usage problem.

继续阅读：.net ado.net datareader sql-server sqldatareader

SQL Server and SqlDataReader - Trillion Records - Memory

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？