How does a sorting network beat generic sorting algorithms?

2023-01-19 05:53 问答作者：

In reference to fastest sort of fixed length 6 int array, I do not fully understand how this 开发者_C百科sorting network beats an algorithm like insertion sort.

Form that question, here is a comparison of the number of CPU cycles taken to complete the sort :

Linux 32 bits, gcc 4.4.1, Intel Core 2 Quad Q8300, -O2

Insertion Sort (Daniel Stutzbach) : 1425

Sorting Networks (Daniel Stutzbach) : 1080

The code used is as follows :

Insertion Sort (Daniel Stutzbach)

static inline void sort6_insertion_sort_v2(int *d){
    int i, j;
    for (i = 1; i < 6; i++) {
            int tmp = d[i];
            for (j = i; j >= 1 && tmp < d[j-1]; j--)
                    d[j] = d[j-1];
            d[j] = tmp;
    }
}

Sorting Networks (Daniel Stutzbach)

static inline void sort6_sorting_network_v1(int * d){
#define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; }
    SWAP(1, 2);
    SWAP(0, 2);
    SWAP(0, 1);
    SWAP(4, 5);
    SWAP(3, 5);
    SWAP(3, 4);
    SWAP(0, 3);
    SWAP(1, 4);
    SWAP(2, 5);
    SWAP(2, 4);
    SWAP(1, 3);
    SWAP(2, 3);
#undef SWAP
}

I understand that sorting networks are really good for sorting in parallel, because some of the steps are independent of the other steps. But here we are not using the parallelization.

I expect it to be faster, as it has the advantage of knowing the exact number of elements beforehand. Where and why exactly does insertion sort make unnecessary comparisons?

EDIT1:

This is the input set these codes are compared against:

int d[6][6] = {\
    {1, 2, 3, 4, 5, 6},\
    {6, 5, 4, 3, 2, 1},\
    {100, 2, 300, 4, 500, 6},\
    {100, 2, 3, 4, 500, 6},\
    {1, 200, 3, 4, 5, 600},\
    {1, 1, 2, 1, 2, 1}\
};\

But here we are not using the parallelization.

Modern CPUs can figure out when instructions are independent and will execute them in parallel. Hence, even though there's only one thread, the sorting network's parallelism can be exploited.

Where exactly does insertion sort make unnecessary comparisons?

The easiest way to see the extra comparisons is to do an example by hand.

Insertion sort:
6 5 4 3 2 1
5 6 4 3 2 1
5 4 6 3 2 1
4 5 6 3 2 1
4 5 3 6 2 1
4 3 5 6 2 1
3 4 5 6 2 1
3 4 5 2 6 1
3 4 2 5 6 1
3 2 4 5 6 1
2 3 4 5 6 1
2 3 4 5 1 6
2 3 4 1 5 6
2 3 1 4 5 6
2 1 3 4 5 6
1 2 3 4 5 6

Sorting network:
6 5 4 3 2 1
6 4 5 3 2 1
5 4 6 3 2 1
4 5 6 3 2 1 # These three can execute in parallel with the first three
4 5 6 3 1 2 #
4 5 6 2 1 3 #
4 5 6 1 2 3
1 5 6 4 2 3
1 2 6 4 5 3
1 2 3 4 5 6
1 2 3 4 5 6

The better question is why the sorting network only outperforms insertion sort (generally a very slow sort) by ~50%. The answer is that big-O is not so important when n is tiny. As for OP's question, Daniel has the best answer.

I think that loop unwinding is what causing the faster results on the sort network algorithm

I believe the amount of 'work' done in a parallel algorithm and a serial algorithm is always almost same. Only that since work gets distributed you would get outputs faster. I think you would get output convincingly faster in case when the size of input is sufficient enough to justify using parallel algorithm.

In case of insertion sort division of array amongst processors is such that it forms a pipeline, and it would take some time to fill the pipeline and then it would produce benefits of parallel algorithm.

Theoretically the code could be about the same if the compiler could fully unroll the loops in the Insertion Sort. The first loop can be easily unrolled, while the second can't be unrolled that easy.

It may also be the case that, because the code is not that simple as the network sorting code, the compiler can make less optimizations. I think there are more dependencies in the insertion sort than in the network sort, which may make a big difference when the compiler tries to optimize the code (correct me if I'm wrong).

I think all of you questions are answered in Daniel Stutzbach answer to the original post:

The algorithm you posted is similar to an insertion sort, but it looks like you've minimized the number of swaps at the cost of more comparisons. Comparisons are far more expensive than swaps, though, because branches can cause the instruction pipeline to stall.

继续阅读：algorithm c comparison sorting sorting-network

How does a sorting network beat generic sorting algorithms?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？