开发者

Parallel ForEach using up very little processing power as time elapses

I have the following code running and as time passes by (an hour or two) i notice that it takes longer and longer to iterate through the开发者_JAVA百科 items. Is there something that i'm doing which is causing this to occur? If so how can i fix it?

        int totalProcessed = 0;
        int totalRecords = MyList.Count();

        Parallel.ForEach(Partitioner.Create(0, totalRecords), (range, loopState) =>
        {
            for (int index = range.Item1; index < range.Item2; index++)
            {
                DoStuff(MyList.ElementAt(index));
                Interlocked.Increment(ref totalImported);
                if (totalImported % 1000 == 0)
                    Log(String.Format("Processed {0} of {1} records",totalProcessed, totalRecords));
            }
        });

         public void DoStuff(IEntity entity)
         {
              foreach (var client in Clients)
              {
                  // Add entity to a db using EF
                  client.Add(entity);
              }
          }

Thanks for any help


ElementAt is very slow extension method with following implementation:

public static void T ElementAt(this IEnumerable<T> collection, int index) 
{
    int i = 0;
    foreach(T e in collection)
    {
        if(i == index)
        {
            return e;
        }
        i++;
    }
    throw new IndexOutOfRangeException();
}

It is obvious that it works longer when index is greater. You should use indexer MyList[index] instead of ElementAt.


As @mace has pointed out using ElementAt has performance issues. Every time you call this an iterator starts at the beginning of MyList and skips n elements until it gets to the desired index. This get cumulatively worse as the index position gets higher.

If you still need streaming access to MyList, you can mitigate performance issue by using Skip and Take. There will still be some performance impacting as you seek to a position in MyList, but Take will ensure you get a batch of elements once you get there, rather than doing this for every element.

I also notice that you are using the partition style foreach, but you are doing this for the whole range. I have implemented partition style with batching in the example below.

int totalRecords = MyList.Count();
int batchSize = 250;

Parallel.ForEach(Partitioner.Create(0, totalRecords, batchSize), range =>
{
    foreach (var thing in MyList.Skip(range.Item1).Take(batchSize))
    {
        DoStuff(thing);

        //logging and stuff...           
    }
});

Update

Having read the question again, you may also have problems with too many threads being used for what is probably an IO bound problem i.e. network and then DB\disk. I say this as you say that there is little CPU utilization, that leads me to think that you are blocked on IO and that this is getting progressively worse.

If it was purely down to ElementAt, you would still see high CPU utilization.

Configure MaxDegreeOfParallelism to tune the maximum number of threads to use:

const int BatchSize = 250;

int totalRecords = MyList.Count();
var partitioner = Partitioner.Create(0, totalRecords, BatchSize);
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(partitioner, options, range =>
{
    foreach (int thing in MyList.Skip(range.Item1).Take(BatchSize))
    {
        DoStuff(thing);

        //logging and stuff...           
    }
});
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜