Data structure which supports < O(n) sum queries of elements 0 up to n
As an example, imagine you had the following numbers in a list in this given order:
list = [4, 10, 3, 5, 1]
so list[0] == 4, and list[4] == 1.
Now imagine you need a sum query that will tell you the sum of all previous values up to that given position.
list.sum(0) == 4
list.sum(1) == 14
list.sum(2) == 17
list.sum(3) == 22
list.sum(4) == 23
In addition I would like the following operations, while still keeping the sum queries intact:
list.swap(0, 1) // swap the two positions
list == [10, 4, 3, 5, 1]
list.slideBefore(0, 3) // slides 1st position value to before the 2nd position
list == [4, 3, 10, 5, 1]
list.slideAfter(2, 3) // slide 1st position value to after 2nd position
list == [4, 3, 5, 10, 1]
list.replace(3, 9) // replace value at 1st param with literal value 2nd param
list == [4, 3, 5, 9, 1]
list.append(17) // adds value to end
list == [4, 3, 5, 9, 1, 17]
This could be trivially handled by an array. But the sum query would always be O(n). I was hoping to find a data structure that would keep the sum query at O(1) or O(lg n), while also keeping the above operations at O(1) or O(lg n).
I believe I might be able to manipulate the fast array data structure to accom开发者_StackOverflow中文版plish what I want, but I haven't worked it out fully.
Another data structure I looked at was the Fenwick tree, but it wasn't clear to me that it would work.
Any suggestions, thoughts, tricks or tips?
Consider a simple array, where you store the sum up to this element instead of element. In that way, the
int sum(int n){
return array[n]; // O(1) !
};
int elem(int n){
if (n)
return array[n] - array[n-1];
return array[0];
};
It would have O(1) times for all operations except replace
, that would take O(n).
You could also consider a binary tree that holds values only in leafs and keeps the sum of its children in every node.
The data structure you want to use will depend a lot on your access pattern. If queries are very frequent and modification operations are infrequent, then you could just maintain a "dirty" flag and re-calculate the sums on query if the "dirty" flag is set.
You could then refine that by setting a "dirty index," which holds the index of the lowest item that's been changed. On query, you have to re-calculate sums for that item and all after. Or, perhaps, only up to the item that you need the sum for, at which point you can update the "dirty index."
That kind of lazy evaluation can be very effective if queries are frequent and modifications are infrequent, or if the pattern is lots of modifications followed by lots of queries.
'swap' and 'append` can be done in O(1) time, and wouldn't "dirty" the sums if they weren't already dirty. 'replace' would of course cause the dirty index to be set at that index (provided, of course, it wasn't already at a lower index).
slidebefore
and slideafter
are inherently O(N) if your data structure is an array, because you have to move the data in the array. In your example, you have:
list == [10, 4, 3, 5, 1]
list.slideBefore(0, 3) // slides 1st position value to before the 2nd position
list == [4, 3, 10, 5, 1]
So items 1 and 2 in the array had to be shifted left one position to make room for item 0 to be repositioned. If you had slideBefore(0, 1000)
, then 1,000 items in the array would have to move up by one position. If those operations are frequent and your list is large, you'd probably want a different underlying representation.
Another possibility is a "list of lists" implementation. Imagine a list of 20 items that's split into 4 sublists of 5 items each. Each sublist maintains a count of the items and a sum of the items in it. Each node in a sublist maintains the running sum of all items before it in the list. When you update an item, you only have to update the sums for that item's sublist. Again, if you use lazy evaulation, you'd only re-compute sums for following sublists if somebody queried for it.
To handle insertions and deletions, allow sublists to grow to some maximum value before they're split. Say your "ideal" is five items per sublist. But you allow it to grow to 10 before splitting it into two sublists. For deletion, you can either allow a sublist to go to 0, or perhaps you combine it with the previous or next sublist if there are fewer than 3 items in the sublist.
The ideal size of sublists will depend on the total number of items you expect to be in the list and, again, the mix of operations you expect to encounter. Operations that are inherently O(N) (like remove and slide) will favor smaller sublists, but then recalculation becomes more expensive because you have more sublists.
This doesn't really change the runtime complexity of the algorithm (that is, O(n/5) is still considered O(N)), but it does change the actual runtime by quite a bit. For moderately sized lists it could be a real win.
精彩评论