Why does hashtable have constant access time in average?

2023-03-02 20:45 问答作者：

I don't understand this explanation which says if n is the number of elements in the hash table and m is the total number of buck开发者_运维百科ets then hashtables have constant access time in average only if n is proportional to theta(n). Why does it have to be proportional ?

well actually m should be proportional to n. Otherwise you could, for example, have just 1 bucket and it would be just like an unsorted set.

To be more precise, if m is proportional to n, i.e. m = c * n, then the number of items in each bucket will be n/m = 1/c which is a constant. Going to any bucket is an O(1) operation (just compute the hash code) and then the search through the bucket is constant order (you could just do a linear search through the items in the bucket which would be a constant).

Thus the order of the algorithm is O(1), if m = c * n.

To take a converse example, suppose we had a fixed size table of size tableSize. Then the expected number of items in each bucket is n/tableSize which is a linear function of n. Any kind of search through the bucket is at best O(log(n)) for a tree (I'm assuming you don't stick another hash table inside the bucket or we then have the same argument over that hash table), so it would not be O(1) in this case.

Strictly speaking, the average-case time complexity of hash table access is actually in Ω(n^1/3). Information can't travel faster than the speed of light, which is a constant. Since space has three dimensions, storing n bits of data requires that some data be located at a distance on the order of n^1/3 from the CPU.

More detail in my blog.

The chance of collisions is higher and thus the incidence of having to scan through the list of items with the same hash key is also higher.

Access time is constant because access is based on a calculation of a hash value and then a constant lookup to find the appropriate bucket. Assuming the hash function evenly distributes items amongst buckets, then the time it takes to access any individual item will be equal to the time to access other items, regardless of n.

Constant doesn't necessarily mean constantly low though. The average access time is related to the even distribution of the hashing function and the number of buckets. If you have thousands of items evenly distributed amongst a small number of buckets, you're finding the bucket fast but then looping through a lot of items in the bucket. If you have a good proportion of buckets to items but a bad hash function that puts many more items in some buckets rather than other, the access time for the items in larger buckets will be slower than access time for others.

A reasonably-sized hash table, where there are enough slots for every element you store and plenty of extra space, will have the hashing function doing most of the work choosing slots and very few collisions where different elements have the same hash. A very crowded hash table would have lots of collisions, and would degrade to basically a linear search, where almost every lookup will be a wrong item that had the same hash and you'll have to keep searching for the right one (a hash table lookup still has to check the key once it picks the first slot, because the key it's looking for might have had a collision when it was stored).

What determines the hit-collision ratio is exactly the ratio of number-of-items to size-of-hash (i.e., the percentage chance that a randomly chosen slot will be filled).

继续阅读：algorithm data-structures hashtable

Why does hashtable have constant access time in average?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？