开发者

How can I refactor this ForEach(..) code to use Parallel.ForEach(..)?

i've got a list of ob开发者_JS百科jects which I wish to copy from one source to another. It was suggested that I could speed things up by using Parallel.ForEach

How can I refactor the following pseduo code to leverage Parallel.ForEach(..) ?

var foos = GetFoos().ToList();
foreach(var foo in foos)
{
    CopyObjectFromOldBucketToNewBucket(foo, oldBucket, newBucket, 
        accessKeyId, secretAccessKey);
}

CopyObjectFromOldBucketToNewBucket uses the Amazon REST APIs to move items from one bucket to another.

Cheers :)


Parallel is actually not the best option here. Parallel will run your code in parallel but will still use up a thread pool thread for each request to AWS. It would be far better use of resources to use the BeginCopyObject method instead. This will not use up a thread pool thread waiting on a response but will only utilize it when the response is received and needs to be processed.

Here's a simplified example of how to use Begin/End methods. These are not specific to AWS but is a pattern found throughout the .NET BCL.

public static CopyFoos() 
{
    var client = new AmazonS3Client(...);
    var foos = GetFoos().ToList();
    var asyncs = new List<IAsyncResult>();
    foreach(var foo in foos)
    {
        var request = new CopyObjectRequest { ... };  

        asyncs.Add(client.BeginCopyObject(request, EndCopy, client));
    }

    foreach(IAsyncResult ar in asyncs)
    {
        if (!ar.IsCompleted)
        {
            ar.AsyncWaitHandle.WaitOne();
        }
    }
}

private static EndCopy(IAsyncRequest ar) 
{    
    ((AmazonS3Client)ar.AsyncState).EndCopyObject(ar);
}

For production code you may want to keep track of how many requests you've dispatched and only send out a limited number at any one time. Testing or AWS docs may tell you how many concurrent requests are optimal.

In this case we don't actually need to do anything when the requests are completed so you may be tempted to skip the EndCopy calls but that would cause a resource leak. Whenever you call BeginXxx you must call the corresponding EndXxx method.


Since your code doesn't have any dependencies other than to foos you can simply do:

Parallel.ForEach(foos, ( foo => 
{
    CopyObjectFromOldBucketToNewBucket(foo, oldBucket, newBucket, 
                                       accessKeyId, secretAccessKey);
}));

Keep in mind though, that I/O can only be parallelized to a certain degree, after that performance might actually degrade.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜