开发者

Parallel Shuffle in C# 4?

As noticed in this question: Randomize a List<T> you can implement a shuffle method on a list; like one of the answers mentions:

using System.Security.Cryptography;
...
public static void Shuffle<T>(this IList<T> list)
{
    RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
    int n = list.Count;
    while (n > 1)
    {
        b开发者_运维技巧yte[] box = new byte[1];
        do provider.GetBytes(box);
        while (!(box[0] < n * (Byte.MaxValue / n)));
        int k = (box[0] % n);
        n--;
        T value = list[k];
        list[k] = list[n];
        list[n] = value;
    }
}

Does anyone know if it's possible to "Parallel-ize" this using some of the new features in C# 4?

Just a curiosity.

Thanks all,

-R.


Your question is ambiguous, but you can use the Parallel Framework to help doing some operations in parallel, but it depends on whether you want to get the bytes, then shuffle them, so the getting of the bytes is in parallel, or, if you want to shuffle multiple lists at one time.

If it is the former, I would suggest that you first break your code into smaller functions, so you can do some analysis to see where the bottlenecks are, as, if the getting of the bytes is the bottleneck, then doing it in parallel may make a difference. By having tests in place you can test new functions and decide if it is worth the added complexity.

static RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();

private static byte[] GetBytes() {
        byte[] box = new byte[1];
        do provider.GetBytes(box);
        while (!(box[0] < n * (Byte.MaxValue / n)));
        return box;
}
private static IList<T> InnerLoop(int n, IList<T> list) {
        var box = GetBytes(n);
        int k = (box[0] % n);
        T value = list[k];
        list[k] = list[n];
        list[n] = value;
        return list;
}

public static void Shuffle<T>(this IList<T> list)
{
    int n = list.Count;
    while (n > 1)
    {
        list = InnerLoop(n, list);
        n--;
    }
}

This is a rough idea as to how to split your function, so you can replace the GetBytes function, but you may need to make some other changes to test it.

Getting some numbers is crucial to make certain that you are getting enough of a benefit to warrant adding complexity though.

You may want to move the lines in InnerLoop that deals with list into a separate function, so you can see if that is slow and perhaps swap out some algorithms to improve it, but, you need to have an idea how fast you need the entire shuffle operation to go.

But if you want to just do multiple lists then it will be easy, you may want to perhaps look at PLINQ for that.

UPDATE

The code above is meant to just show an example of how it can be broken into smaller functions, not to be a working solution. If it is necessary to move the Provider class that I put into a static variable into a function and then pass it as a parameter then that may need to be done. I didn't test the code, but my suggestion is based on getting profiling then look at how to improve it's performance, especially since I am not certain which way it was meant to be done in parallel. It may be necessary to just build up an array in order, in parallel, then shuffle them, but first see what the time needed for each operation is, then see if doing it in parallel will be warranted.

There may be a need to use concurrent data structures also, if using multiple threads, in order to not pay a penalty by having to synchronize yourself, but, again, that may not be needed, depending on where the bottleneck is.

UPDATE:

Based on the answer to my comment, you may want to look at the various functions in the parallel library, but this page may help. http://msdn.microsoft.com/en-us/library/dd537609.aspx.

You can create a Func version of your function, and pass that in as a parameter. There are multiple ways to work use this library to make this function in parallel, as you already don't have any global variables, just lose the static operator.

You will want to get numbers as you add more threads though, to see where you start to see a decrease in performance, as you won't see a 1:1 improvement, so, if you add 2 threads it won't go twice as fast, so just test and see where it becomes a problem having more threads. Since your function is CPU bound you may want to have only one thread per core, as a rough starting point.


Any in-place shuffle is not very well suited for parallelization. Especially since a shuffle requires a random component (over the range) so there is no way to localize parts of the problem.


Well, you could fairly easily parallelize the code which is generating the random numbers - which could be the bottleneck when using a cryptographically secure random number generator. You could then use that sequence of random numbers (which would need to have its order preserved) within a single thread performing the swaps.

One problem which has just occurred to me though, is that RNGCryptoServiceProvider isn't thread-safe (and neither is System.Random). You'd need as many random number generators as threads to make this work. Basically it becomes a bit ugly :(

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜