How to prove a lower bound logn for a data structure?
I have a homework question as follows (note that I am not looking for exact answers, just looking for simple suggestions to move on).
S is a data structure which supports Insert(x,S), Delete(x, S) and Find_Smallest_Item(S) in time <= T(n). Prove a lower bound for T(n) such as Ω(logn).
Here is my thoughts so far:
I guess I need to find a reduction where I will reduce this problem into an easier one, and prove it cannot be lower than logn. I read many tutorials about lower bounds and most of them reduced a problem into sorting, then they used sorting as a black box and proved the algorithm cannot be lower than nlogn.
But here, we are dealing with logn and I don't know such an algorithm to reduce to. Maybe it's gotta do something with the depth of a tree structure, logn. But I couldn't figure out where to start.
开发者_JS百科Can you suggest me some hints?
Edit: Actually something came to my mind but I don't know if I am supposed to prove a lower bound with a trick like that. So, I assume I have insert, delete and find_smallest operations, each of them has a logn time complexity.
For example, to construct a sorted list, I can use delete and find_smallest functions, e.g. I can run find_smallest for the first time, and after finding the smallest element in the list, then I will delete the element. I will run it once again and therefore I will find 2nd smallest element, and so on.
Therefore, I can implement sorting by using deletion and find_smallest functions. So if I continue doing this n times, each of them will take logn (for deletion) + logn (for finding smallest), so in overall, sorting will take nlogn.
I don't know how to tweak this for insertion, though.
Edit 2: In order to use insertion for the proof: after finding the ith smallest element in the list, what if I insert it to ith place? e.g. after finding 3rd smallest element with the procedure above, I can insert it to 3rd index of the data structure. Therefore, at the end I will obtain a sorted data structure.
Reducing your problem to another one will give you an upper bound on your O(), not a lower one.
On the other hand, if you can use any solution to your problem to implement some other algorithm with a well-known lower bound (effectively reducing that problem to yours), that may give you the lower bound you're looking for.
Answer:
As others have suggested, you can use the datastructure S to implement sorting:
for i in range(input):
Insert(S, input[i])
for i in range(input):
x = Find_Smallest_Item(S)
output[i] = x
Delete(S, x)
For input of size N this algorithm makes N calls to each of your three operations. However, we know that any general purpose sorting algorithm must have a worst case of O(N log N).
This means there will be cases for which the average time of the datastructure calls in the above sort is O(log N) per call. Since that is incompatible with any T() asymptotically better than log N, you have your lower bound.
Some notes:
The kind of datastructure described in your problem is called a priority queue.
Since you're trying to prove a lower bound on any possible priority queue, you can't make an assumption about the implementation. Just because a particular datastructure gives you a particular O() performance, doesn't mean some completely different datastructure couldn't be better.
There are many priority queue datastructures that satisfy O(log N) for all calls, so this is actually a tight lower bound.
Your first edit isn't correct.
I can run find_smallest for the first time, and after finding the smallest element in the list,
Running Find_Smallest_Item(S)
will find the smallest element in S, not the smallest element in the list. You first need to Insert(elements from the list,S) before there is anything to find!
Maybe you should try to write down some code, like this:
List l = [1,25,4,3,7]
S S // empty
List sorted // empty
//now we can do:
insert(l[0],S)
//or
delete(25,S)
//magic goes here
...
//and l is sorted!
The trick is to write down code that sorts l (or makes another sorted list). To prove the lower bound, count the steps(inserts,deletes and findmins) and use the fact that no matter what code you write it can't be (worst case) faster than O(nlogn)
(for comparison sorts). This gives a lower bound, it has to be at least this slow.
Comparison-based sorting is Ω(n*log(n)) because the problem can be stated as a decision-tree problem where at each inner node you compare 2 elements of the list. An algorithm must be able to reach any permutation of the input and thus the number of leaves on the decision tree is at least n!. The number of leaves in a binary tree is at most 2^h, where h is the height, so n! <= 2^h. From this, h >= log_2(n!) which is Ω(n*log(n)).
The operations you specified don't have to consider all n! permutations of the input, only O(n) as SpeedBirdNine mentioned. Use this to bound the height of the decision tree leading to a leaf.
I think you've got the answer pretty close. If you want to prove lower bound of operations of S using sorting, you just implement the sorting algorithm:
sort_with_S(items)
S s
add all items to s .............. < n*T_insert
while s nonempty
output find_smallest .......... < n*T_find_smallest
remove the smallest element ... < n*T_remove
Now, you sum up the times for the operations. If your algorithm (and thus your structure) can handle any item type that can be compared, you we know the sort algorithm has worst case complexity Omega(n log n)
. You assume (in the statement of the problem) that all three operations have complexity <= T(n), that means your S can sort with complexity O(n T(n))
. You just compare the bounds...
Ω(logn)
is an example of a lower bound, you're being asked to prove that T(n)
has a lower bound, not that its lower bound is Ω(logn)
Well it is a pretty broad question, the lower bound for Insert(x,S), Delete(x, S) and Find_Smallest_Item(S) will depend on what data structure is being used for storing the items in which it is being inserted, etc, like for an array you can insert in O(1), delete depends on finding the item in that array and then deleting it (using linear search it would be O(n) and using binary search it would be O(log n)), and for Find_Smallest_Item for an array, it'd be about sorting, if data is random, it'd be O(n) but you can use a trick like a variable to save the smallest, and on insert compare it what that smallest, and if it is not, modify the smallest, then you can return that smallest in O(1) and if you need a pointer etc to where that smallest item is exactly, you can find it in O(log n).
With tree it is a whole different story.
So what data structure S is maters a lot, but here you need a lower bound i.e. how low can the lowest bound be, for this you'll need to look at how you will prove that no matter what is done, Insert cannot have this lower bound, delete has this lower bound and find_smallest has this lower bound. This means that you need to provide a lower bound for all methods, all that is possible.
It is an interesting problem, and do post your findings here. Just for interest, Bead sort can theoretically sort in O(1).
精彩评论