Array-like datastructure performance
I'm trying to optimize a data structure which is heavily used. Right now this data structure is implemented as a dynamic array of composite nodes. A typical usage is sequential accesss from the first element to the last element with reading and modification of constituent values (integers and doubles). Typical array size is from 10 to 200 elements. Another type of operation is insertions/removals of elements of this array. After some profiling I found out that insertions are very expensive in terms of overall algorithm performance, so I decided o change this array to one of the two datastructures:
- Array of pointers to elements
- Array of indices to another array containing actual elements
This way I will only do insertions and removals in the index/pointer array which is much cheaper.
The second datastructure is much more complicated than the first one, it will require additional operations to avoid reallocations in element array, but it will keep all the elements in the same memory region which I think will be better开发者_JAVA技巧 for processor cache. My question is which way is better? And which is the best way to implement 2 variant to keep all the array elements in the same memory region?
There is a very elegant data structure called a tiered vector which supports O(1) worst-case lookup, O(1) amortized append, and O(sqrt n) amortized insertion at any point in the structure. Moreover, it's very memory-efficient, wasting only O(sqrt n) overhead at any point (compared to the O(n) overhead from a dynamic array). I don't know how useful this might be in your particular instance, but a description can be found here:
http://www.cs.brown.edu/cgc/jdsl/papers/tiered-vector.pdf
Have you thought about using a skip list, specifically an "indexable skip list"? An indexable skip list gives you O(logn) insertion, removal and lookup operations. Furthermore, it is similar to the current array-based implementation so it won't fundamentally change how the data structure is thought of.
How the insertion/deletions are done is sort of unclear, but by your current description you should be using a simple linked list.
Specifically, you say the use is iterating through the list, inserting and deleting. If the insertions are done at the beginning, end, or at the iterator then a linked list can do them all in constant time. Your requirements do not seem to include random access or an ordering.
If you are really worried about processor cache (which, at this point, you probably shouldn't be) you can allocate the list nodes from an array-based pool.
精彩评论