Best sorting algorithm for sorting structs with byte comparisons?
I have an array of 64 structs that hold a decent amount of data (struct is around 128 bytes, so that's 8192 bytes that need to be re-arragned). The array needs to be sorted based off a single unsigned byte in each struct. An interesting property of my data is that it is likely that there will be many duplicates of sorted value - meaning that if you got rid of all duplicates, the array may only be 10 unique elements long, but this is not a given.
Once sorted, I need to create a stack that stores the size and type that each unique byte-run starts: so if I ended up with sorted values: 4,4,4,9,9,9,9,9,14,14 the stack would be: (4,3), (9,5), (14,2)
I figured that there would be some nice optimizations I could perform under these conditions. If I do heapsort, I can create the stack while I'm sorting, but would this be faster than a qsort and then build the stac开发者_C百科k afterwords? Would any sorting algorithm run slower because of the large structs I'm using? Any optimizations I can make because I'm only comparing bytes?
BTW: language is c++
Thanks.
I would imagine that STL will do what you want nicely. Re-writing your own sort routines and containers is likely to be error-prone, and slow. So only worry about if you find it's a bottleneck.
In general with large objects it can be faster to sort an array of pointers/indices of the objects rather than the objects. Or sort an array of nodes, where each node contains a pointer/index of the object and the object's sort key (in this case the key is one byte). To do this in C++ you can just supply a suitable comparator to std::sort
or std::stable_sort
. Then if you need the original objects in order, as opposed to just needing to know the correct order, finally copy the objects into a new array.
Copying 128 bytes is almost certainly much slower than performing a byte comparison, even with an extra indirection. So for optimal performance it's the moves you need to look at, not the comparisons, and dealing in pointers is one way to avoid most of the moving.
You could build your run-length encoding as you perform the copy at the end.
Of course it might be possible to go even faster with some custom sorting algorithm which makes special use of the numbers in your case (64, "around 128", and 1). But even simple questions like "which is fastest - introsort, heap sort or merge sort" are generally impossible to answer without writing and running the code.
The sort would not be slower because you will be sorting pointer or references to the structs and not the actual struct in memory.
The fact that your keys are integers, and there really aren't a lot of them, odds are the Bucket Sort, with a bucket size of 1, would be very applicable.
精彩评论