How do I get the difference between 2 large sets in .net
I need to get the set of GUIDs in a remote database which do not exist in an IEnumerable (for context, this is coming fro开发者_开发技巧m a Lucene index). There are potentially many millions of these Guids.
I currently think that inserting the IEnumerable to the database and doing the difference there will be too expensive (the inserts will hammer the database), but I am prepared to be proven wrong!
Reading both sets into memory is also infeasible due to the amount of data - our existing solution does this and fails with very large sets.
I would like a solution which can operate on a small subset of the data at a time so that we have a constant memory footprint. We have an idea as to how to roll our own implementation of this, but it is non-trivial, so would obviously rather use an existing one if it exists.
If anybody has any recommendations for an existing solution, I'd be grateful to hear them!
You could use SqlBulkCopy to load the guids very fast to the database(if it is SQL-Server).
精彩评论