When is a hash table better to use than a search tree?

2023-01-31 00:45 问答作者：

When is a 开发者_如何学Chash table better to use than a search tree?

Depends on what you want to do with the data structure.

Operation         Hash table  Search Tree
Search            O(1)        O(log(N))
Insert            O(1)        O(log(N))
Delete            O(1)        O(log(N))
Traversal         O(N)        O(N)
Min/Max-Key       -hard-      O(log(N))
Find-Next-Key     -hard-      O(1)

Insert, Search on Hashtable depend on the load factor of the hash table and its design. Poorly designed hastables can have O(N) search and insert. The same is true for your Search Tree.
Deleting in a hash table can be cumbersome depending on your collision resolution stategy.
Traversing the container, Finding Min/Max, Find Next/Prev sort of operations are better on a search tree because of its ordering.
All estimates of search tree above are for 'balanced' search trees.

When the average access and insertion time are more important than the best access and insertion time. Practically I think search trees are usually as good a solution as hash tables, because even though in theory big theta of one is better than big theta of log n, log n is very fast, and as you start dealing with large values of n the effect on the practical difference shrinks. Also, big theta of one says nothing about the value of the constant. Granted, this holds for the complexity of trees as well, but the constant factors of trees are much more fixed, usually at a very low number, among implementations than those of hash tables.

Again, I know theorists will disagree with me here, but it's computers we're dealing with here, and for log n to be of any significance burden for a computer n must be unrealistically large. If n is a trillion then log of n is 40, and a computer today can perform 40 iterations rather quickly. For log of n to grow to 50 you already have over a quadrillion elements.

The C++ standard as it stands today doesn't provide a hash-table among its containers and I think there's a reason people were fine with it as it is for over a decade.

My take on things:

Operation                  Hash table(1)  SBB Search Tree(2)
.find(obj) -> obj          O(1)           O(1)*

.insert(obj)               O(1)           O(log(N))

.delete(obj)               O(1)           O(log(N))

.traverse / for x in...    O(N)           O(N)

.largerThan(obj) -> {objs} unsupported    O(log(N))
                                           \
                                            union right O(1) + parent O(1)

.sorted() -> [obj]         unsupported    no need
                                           \
                                            already sorted so no need
                                            to print out, .traverse() is O(N)

.findMin() -> obj          unsupported**  O(log(N)), maybe O(1)
                                           \
                                            descend from root, e.g.:
                                            root.left.left.left...left -> O(log(N))
                                            might be able to cache for O(1)

.findNext(obj) -> obj      unsupported    O(log(N))
                                           \
                                            first perform x=.find(obj) which is O(1)
                                            then descend from that node, e.g.:
                                            x.right.left.left...right -> O(log(N))

(1) http://en.wikipedia.org/wiki/Hash_table

(2) http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree , e.g. http://en.wikipedia.org/wiki/Tango_tree or http://en.wikipedia.org/wiki/Splay_tree

(*) You can use a hash table in conjunction with a search tree to obtain this. There is no asymptotic speed or space penalty. Otherwise, it's O(log(N)).

(**) Unless you never delete, in which case just cache smallest and largest elements and it's O(1).

These costs may be amortized.

Conclusion:

You want to use trees when the ordering matters.

Among many issues, it depends on how expensive the hash function is. In my experience, hashes are generally about twice as fast as balanced trees for a sensible hash function, but it's certainly possible for them to be slower.

When is a hash table better to use than a search tree?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？