开发者

When is a hash table better to use than a search tree?

When is a 开发者_如何学Chash table better to use than a search tree?


Depends on what you want to do with the data structure.

Operation         Hash table  Search Tree
Search            O(1)        O(log(N))
Insert            O(1)        O(log(N))
Delete            O(1)        O(log(N))
Traversal         O(N)        O(N)
Min/Max-Key       -hard-      O(log(N))
Find-Next-Key     -hard-      O(1)
  • Insert, Search on Hashtable depend on the load factor of the hash table and its design. Poorly designed hastables can have O(N) search and insert. The same is true for your Search Tree.

  • Deleting in a hash table can be cumbersome depending on your collision resolution stategy.

  • Traversing the container, Finding Min/Max, Find Next/Prev sort of operations are better on a search tree because of its ordering.

  • All estimates of search tree above are for 'balanced' search trees.


When the average access and insertion time are more important than the best access and insertion time. Practically I think search trees are usually as good a solution as hash tables, because even though in theory big theta of one is better than big theta of log n, log n is very fast, and as you start dealing with large values of n the effect on the practical difference shrinks. Also, big theta of one says nothing about the value of the constant. Granted, this holds for the complexity of trees as well, but the constant factors of trees are much more fixed, usually at a very low number, among implementations than those of hash tables.

Again, I know theorists will disagree with me here, but it's computers we're dealing with here, and for log n to be of any significance burden for a computer n must be unrealistically large. If n is a trillion then log of n is 40, and a computer today can perform 40 iterations rather quickly. For log of n to grow to 50 you already have over a quadrillion elements.

The C++ standard as it stands today doesn't provide a hash-table among its containers and I think there's a reason people were fine with it as it is for over a decade.


My take on things:

Operation                  Hash table(1)  SBB Search Tree(2)
.find(obj) -> obj          O(1)           O(1)*

.insert(obj)               O(1)           O(log(N))

.delete(obj)               O(1)           O(log(N))

.traverse / for x in...    O(N)           O(N)

.largerThan(obj) -> {objs} unsupported    O(log(N))
                                           \
                                            union right O(1) + parent O(1)

.sorted() -> [obj]         unsupported    no need
                                           \
                                            already sorted so no need
                                            to print out, .traverse() is O(N)

.findMin() -> obj          unsupported**  O(log(N)), maybe O(1)
                                           \
                                            descend from root, e.g.:
                                            root.left.left.left...left -> O(log(N))
                                            might be able to cache for O(1)

.findNext(obj) -> obj      unsupported    O(log(N))
                                           \
                                            first perform x=.find(obj) which is O(1)
                                            then descend from that node, e.g.:
                                            x.right.left.left...right -> O(log(N))

(1) http://en.wikipedia.org/wiki/Hash_table

(2) http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree , e.g. http://en.wikipedia.org/wiki/Tango_tree or http://en.wikipedia.org/wiki/Splay_tree

(*) You can use a hash table in conjunction with a search tree to obtain this. There is no asymptotic speed or space penalty. Otherwise, it's O(log(N)).

(**) Unless you never delete, in which case just cache smallest and largest elements and it's O(1).

These costs may be amortized.

Conclusion:

You want to use trees when the ordering matters.


Among many issues, it depends on how expensive the hash function is. In my experience, hashes are generally about twice as fast as balanced trees for a sensible hash function, but it's certainly possible for them to be slower.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜