Using Parallel Extensions or Parallel LINQ with LINQ Take
I have a database with about 5 million rows in it. I am trying to generate XML strings for the database and push them to a service. Instead of doing this one at a time, the service supports taking 1000 records at a time. At the moment, this is quite slow, taking upwards of 10 seconds per 1000 records (including writing back to the database and uploading to the service).
I tried to get the following code working, but have failed... I get a crash when I try it. Any ideas?
var data = <insert LINQ query here>
int take = 1000
int left = data.Count();
Parallel.For(0, left / 1000, i =>
{
data.Skip(i*1000).Take(1000)...
//Generate XML here.
//Write to service here...
//Mark items in database as generated.
});
//Get companies which are still marked as not generated.
//Create XML.
//Write to Service.
I get a crash telling me that the index is out of bounds. If left
is 5 million, the number in the loop should be no more than 5000. If I multiply that again by 1000, I should not get more than 5 million. I wouldn't mind if it worked for a bit, and 开发者_Python百科then failed, but it just fails after the SQL query!
I think it doesn't like your last index value - it should be left / 1000 -1, not left / 1000:
Parallel.For(0, left / 1000 - 1, i =>
{
data.Skip(i*1000).Take(1000)...
//Generate XML here
//Write to Service here...
//mark items in DB as generated
});
I suspect the index out of bounds error is caused by code other than what is currently being displayed.
That being said, this could be handled in a much cleaner manner. Instead of using this approach, you should consider switching to using a custom partitioner. This will be dramatically more efficient, as each call to Skip/Take is going to force a re-evaluation of your sequence.
精彩评论