开发者

Why use binary search if there's ternary search?

I recently heard about ternary search in which we divide an array into 3 parts and compare. Here there will be two comparisons but it reduces the array to n/3. Why don't peopl开发者_如何学编程e use this much?


Actually, people do use k-ary trees for arbitrary k.

This is, however, a tradeoff.

To find an element in a k-ary tree, you need around k*ln(N)/ln(k) operations (remember the change-of-base formula). The larger your k is, the more overall operations you need.

The logical extension of what you are saying is "why don't people use an N-ary tree for N data elements?". Which, of course, would be an array.


A ternary search will still give you the same asymptotic complexity O(log N) search time, and adds complexity to the implementation.

The same argument can be said for why you would not want a quad search or any other higher order.


Searching 1 billion (a US billion - 1,000,000,000) sorted items would take an average of about 15 compares with binary search and about 9 compares with a ternary search - not a huge advantage. And note that each 'ternary compare' might involve 2 actual comparisons.


Wow. The top voted answers miss the boat on this one, I think.

Your CPU doesn't support ternary logic as a single operation; it breaks ternary logic into several steps of binary logic. The most optimal code for the CPU is binary logic. If chips were common that supported ternary logic as a single operation, you'd be right.

B-Trees can have multiple branches at each node; a order-3 B-tree is ternary logic. Each step down the tree will take two comparisons instead of one, and this will probably cause it to be slower in CPU time.

B-Trees, however, are pretty common. If you assume that every node in the tree will be stored somewhere separately on disk, you're going to spend most of your time reading from disk... and the CPU won't be a bottleneck, but the disk will be. So you take a B-tree with 100,000 children per node, or whatever else will barely fit into one block of memory. B-trees with that kind of branching factor would rarely be more than three nodes high, and you'd only have three disk reads - three stops at a bottleneck - to search an enormous, enormous dataset.

Reviewing:

  • Ternary trees aren't supported by hardware, so they run less quickly.
  • B-tress with orders much, much, much higher than 3 are common for disk-optimization of large datasets; once you've gone past 2, go higher than 3.


The only way a ternary search can be faster than a binary search is if a 3-way partition determination can be done for less than about 1.55 times the cost of a 2-way comparison. If the items are stored in a sorted array, the 3-way determination will on average be 1.66 times as expensive as a 2-way determination. If information is stored in a tree, however, the cost to fetch information is high relative to the cost of actually comparing, and cache locality means the cost of randomly fetching a pair of related data is not much worse than the cost of fetching a single datum, a ternary or n-way tree may improve efficiency greatly.


What makes you think Ternary search should be faster?

Average number of comparisons:

in ternary search = ((1/3)*1 + (2/3)*2) * ln(n)/ln(3) ~ 1.517*ln(n)
in binary search  =                   1 * ln(n)/ln(2) ~ 1.443*ln(n).

Worst number of comparisons:

in ternary search = 2 * ln(n)/ln(3) ~ 1.820*ln(n)
in binary search  = 1 * ln(n)/ln(2) ~ 1.443*ln(n).

So it looks like ternary search is worse.


Also, note that this sequence generalizes to linear search if we go on

Binary search
Ternary search
...
...
n-ary search ≡ linear search

So, in an n-ary search, we will have "one only COMPARE" which might take upto n actual comparisons.


"Terinary" (ternary?) search is more efficient in the best case, which would involve searching for the first element (or perhaps the last, depending on which comparison you do first). For elements farther from the end you're checking first, while two comparisons would narrow the array by 2/3 each time, the same two comparisons with binary search would narrow the search space by 3/4.

Add to that, binary search is simpler. You just compare and get one half or the other, rather than compare, if less than get the first third, else compare, if less than get the second third, else get the last third.


Ternary search can be effectively used on parallel architectures - FPGAs and ASICs. For example if internal FPGA memory required for search is less than half of the FPGA resource, you can make a duplicate memory block. This would allow to simultaneously access two different memory addresses and do all comparisons in a single clock cycle. This is one of the reasons why 100MHz FPGA can sometimes outperform the 4GHz CPU :)


Here's some random experimental evidence that I haven't vetted at all showing that it's slower than binary search.


Almost all textbooks and websites on binary search trees do not really talk about binary trees! They show you ternary search trees! True binary trees store data in their leaves not internal nodes (except for keys to navigate). Some call these leaf trees and make the distinction between node trees shown in textbooks:

J. Nievergelt, C.-K. Wong: Upper Bounds for the Total Path Length of Binary Trees, Journal ACM 20 (1973) 1–6.

The following about this is from Peter Brass's book on data structures.

2.1 Two Models of Search Trees

In the outline just given, we supressed an important point that at first seems trivial, but indeed it leads to two different models of search trees, either of which can be combined with much of the following material, but one of which is strongly preferable.

If we compare in each node the query key with the key contained in the node and follow the left branch if the query key is smaller and the right branch if the query key is larger, then what happens if they are equal? The two models of search trees are as follows:

  1. Take left branch if query key is smaller than node key; otherwise take the right branch, until you reach a leaf of the tree. The keys in the interior node of the tree are only for comparison; all the objects are in the leaves.

  2. Take left branch if query key is smaller than node key; take the right branch if the query key is larger than the node key; and take the object contained in the node if they are equal.

This minor point has a number of consequences:

{ In model 1, the underlying tree is a binary tree, whereas in model 2, each tree node is really a ternary node with a special middle neighbor.

{ In model 1, each interior node has a left and a right subtree (each possibly a leaf node of the tree), whereas in model 2, we have to allow incomplete nodes, where left or right subtree might be missing, and only the comparison object and key are guaranteed to exist.

So the structure of a search tree of model 1 is more regular than that of a tree of model 2; this is, at least for the implementation, a clear advantage.

{ In model 1, traversing an interior node requires only one comparison, whereas in model 2, we need two comparisons to check the three possibilities.

Indeed, trees of the same height in models 1 and 2 contain at most approximately the same number of objects, but one needs twice as many comparisons in model 2 to reach the deepest objects of the tree. Of course, in model 2, there are also some objects that are reached much earlier; the object in the root is found with only two comparisons, but almost all objects are on or near the deepest level.

Theorem. A tree of height h and model 1 contains at most 2^h objects. A tree of height h and model 2 contains at most 2^h+1 − 1 objects.

This is easily seen because the tree of height h has as left and right subtrees a tree of height at most h − 1 each, and in model 2 one additional object between them.

{ In model 1, keys in interior nodes serve only for comparisons and may reappear in the leaves for the identification of the objects. In model 2, each key appears only once, together with its object.

It is even possible in model 1 that there are keys used for comparison that do not belong to any object, for example, if the object has been deleted. By conceptually separating these functions of comparison and identification, this is not surprising, and in later structures we might even need to define artificial tests not corresponding to any object, just to get a good division of the search space. All keys used for comparison are necessarily distinct because in a model 1 tree, each interior node has nonempty left and right subtrees. So each key occurs at most twice, once as comparison key and once as identification key in the leaf.

Model 2 became the preferred textbook version because in most textbooks the distinction between object and its key is not made: the key is the object. Then it becomes unnatural to duplicate the key in the tree structure. But in all real applications, the distinction between key and object is quite important. One almost never wishes to keep track of just a set of numbers; the numbers are normally associated with some further information, which is often much larger than the key itself.


You may have heard ternary search being used in those riddles that involve weighing things on scales. Those scales can return 3 answers: left is lighter, both are the same, or left is heavier. So in a ternary search, it only takes 1 comparison. However, computers use boolean logic, which only has 2 answers. To do the ternary search, you'd actually have to do 2 comparisons instead of 1. I guess there are some cases where this is still faster as earlier posters mentioned, but you can see that ternary search isn't always better, and it's more confusing and less natural to implement on a computer.


Theoretically the minimum of k/ln(k) is achieved at e and since 3 is closer to e than 2 it requires less comparisons. You can check that 3/ln(3) = 2.73.. and 2/ln(2) = 2.88.. The reason why binary search could be faster is that the code for it will have less branches and will run faster on modern CPUs.


I have just posted a blog about the ternary search and I have shown some results. I have also provided some initial level implementations on my git repo I totally agree with every one about the theory part of the ternary search but why not give it a try? As per the implementation that part is easy enough if you have three years of coding experience. I found that if you have huge data set and you need to search it many times ternary search has an advantage. If you think you can do better with a ternary search go for it.


Although you get the same big-O complexity (ln n) in both search trees, the difference is in the constants. You have to do more comparisons for a ternary search tree at each level. So the difference boils down to k/ln(k) for a k-ary search tree. This has a minimum value at e=2.7 and k=2 provides the optimal result.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜