开发者

Exhaustive searches vs sorting followed by binary search

This is a direct quote from the textbook, Invitation to Computer Science by G. Michael Scneider and Judith L. Gersting.

At the end of Section 3.4.2, we talked about the tradeoff between using sequential search on an unsorted list as opposed to sorting the list and then using binary search. If the list size is n=100,000 about how many worst-case searches must be done before the second alternative is better in terms of number of comparisons?

I don't really get what the question is asking开发者_运维百科 for.

Sequential search is of order (n) and binary is of order (lgn) which in any case lgn will always be less than n. And in this case n is already given so what am I supposed to find.

This is one of my homework assignment but I don't really know what to do. Could anyone explain the question in plain English for me?


and binary is of order (lgn) which in any case lgn will always be less than n
This is where you're wrong. In assignment, you're asked to consider the cost of sorting array too.

Obviously, if you need only one search, first approach is better than sorting array and doing binary search: n < n*logn + logn. And you're asked, how many searches you need for second approach to become more effective.

End of hint.


The question is how to decide which approach to choose - to just use linear search or to sort and then use binary search.

If you only search a couple of times linear search is better - it is O(n), while sorting is already O(n*logn). If you search very often on the same collection sorting is better - searching multiple times can become O(n*n) but sorting and then searching with binary search is again O(n*logn) + NumberOfSearches*O(logn) which can be less or more than using linear search depending on how NumberOfSearches and n relate.

The task is to determine the exact value of NumberOfSearches (not the exact number, but a function of n) which will make one of the options preferable:

 NumberOfSearches * O(n) <> O(n*logn) + NumberOfSearches * O(logn)

don't forget that each O() can have a different constant value.


The order of the methods is not important here. It tells you something how well algorithms scale when the problem becomes bigger and bigger. You can't do any exact calculations if you only know O(n) == it complexity grows linear in the size of the problem. It won't give you any numbers.

This can well mean that an algorithm with O(n) complexity is faster than a O(logn) algorithm, for some n. Because O(log(n)) scales better when it gets larger, we know for sure, there is a n (a problem size) where the algorithm with O(logn) complexity is faster. We just don't know when (for what n).

In plain english:

If you want to know 'how many searches', you need exact equations to solve, you need exact numbers. How many comparisons does it take to search sequential? (Remember n is given, so you can give a number.) How many comparisons (in the worst case!) does it take to search with a binary search? Before you can do a binary search, you have to sort. Let's add the number of comparisons needed to sort to the cost of binary search. Now compare the two numbers, which one is less?

The binary search is fast, but the sorting is slow. The sequential search is slower than binary search, but faster than sorting. However the sorting needs to be done only once, no matter how many times you search. So, when does one heavy sort outweigh having to do a slow (sequential) search every time?

Good luck!


For sequential search, the worst case is n = 100000, so for p searches p × 100000 comparisons are required.

Using a Θ(n2) sorting algorithm would require 100000 × 100000 comparisons.

Binary search would require 1 + log n = 1 + log 100000 = 17 comparisons for each search,

together there would be 100000×100000 + 17p comparisons.

The first expression is larger than the second, meaning 100000p > 100000^2 + 17p

For p > 100017.


The question is about appreciating the number NUM_SEARCHES needed to compensate the cost of sorting. So we'll have:

 time( NUM_SEARCHES * O(n) ) > time( NUM_SEARCHES * O(log(n)) + O(n* log(n)) )


Thank you guys. I think I get the point now. Could you take a look at my answer and see whether I'm on the right track.

For worst case searches Number of comparison for sequential search is n = 100,000. Number of comparison for binary search is lg(n) = 17. Number of comparison for sorting is (n-1)/2 * n = (99999)(50000). (I'm following my textbook and used the selection sort algorithm covered in my class)

So let p be the number of worst case searches, then 100,000p > (99999)(50000) + 17p
OR p > 50008

In conclusion, I need 50,008 worst case searches to make sorting and using binary search better than a sequential search for a list of n=100,000.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜