Chained Hash Tables vs. Open-Addressed Hash Tables

2022-12-25 12:25 问答作者：

Can somebody explain the main differences between (advantages / disadvantages) the two implementations?

For a library, what implementation is recom开发者_运维技巧mended?

Wikipedia's article on hash tables gives a distinctly better explanation and overview of different hash table schemes that people have used than I'm able to off the top of my head. In fact you're probably better off reading that article than asking the question here. :)

That said...

A chained hash table indexes into an array of pointers to the heads of linked lists. Each linked list cell has the key for which it was allocated and the value which was inserted for that key. When you want to look up a particular element from its key, the key's hash is used to work out which linked list to follow, and then that particular list is traversed to find the element that you're after. If more than one key in the hash table has the same hash, then you'll have linked lists with more than one element.

The downside of chained hashing is having to follow pointers in order to search linked lists. The upside is that chained hash tables only get linearly slower as the load factor (the ratio of elements in the hash table to the length of the bucket array) increases, even if it rises above 1.

An open-addressing hash table indexes into an array of pointers to pairs of (key, value). You use the key's hash value to work out which slot in the array to look at first. If more than one key in the hash table has the same hash, then you use some scheme to decide on another slot to look in instead. For example, linear probing is where you look at the next slot after the one chosen, and then the next slot after that, and so on until you either find a slot that matches the key you're looking for, or you hit an empty slot (in which case the key must not be there).

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. It gets very, very slow if the load factor approaches 1, because you end up usually having to search through many of the slots in the bucket array before you find either the key that you were looking for or an empty slot. Also, you can never have more elements in the hash table than there are entries in the bucket array.

To deal with the fact that all hash tables at least get slower (and in some cases actually break completely) when their load factor approaches 1, practical hash table implementations make the bucket array larger (by allocating a new bucket array, and copying elements from the old one into the new one, then freeing the old one) when the load factor gets above a certain value (typically about 0.7).

There are lots of variations on all of the above. Again, please see the wikipedia article, it really is quite good.

For a library that is meant to be used by other people, I would strongly recommend experimenting. Since they're generally quite performance-crucial, you're usually best off using somebody else's implementation of a hash table which has already been carefully tuned. There are lots of open-source BSD, LGPL and GPL licensed hash table implementations.

If you're working with GTK, for example, then you'll find that there's a good hash table in GLib.

My understanding (in simple terms) is that both the methods has pros and cons, though most of the libraries use Chaining strategy.

Chaining Method:

Here the hash tables array maps to a linked list of items. This is efficient if the number of collision is fairly small. The worst case scenario is O(n) where n is the number of elements in the table.

Open Addressing with Linear Probe:

Here when the collision occurs, move on to the next index until we find an open spot. So, if the number of collision is low, this is very fast and space efficient. The limitation here is the total number of entries in the table is limited by the size of the array. This is not the case with chaining.

There is another approach which is Chaining with binary search trees. In this approach, when the collision occurs, they are stored in binary search tree instead of linked list. Hence, the worst case scenario here would be O(log n). In practice, this approach is best suited when there is a extremely nonuniform distribution.

Since excellent explanation is given, I'd just add visualizations taken from CLRS for further illustration:

Open Addressing:

Chained Hash Tables vs. Open-Addressed Hash Tables

Chaining:

Chained Hash Tables vs. Open-Addressed Hash Tables

Open addressing vs. separate chaining

Linear probing, double and random hashing are appropriate if the keys are kept as entries in the hashtable itself... doing that is called "open addressing" it is also called "closed hashing"

Another idea: Entries in the hashtable are just pointers to the head of a linked list (“chain”); elements of the linked list contain the keys... this is called "separate chaining" it is also called "open hashing"

Collision resolution becomes easy with separate chaining: just insert a key in its linked list if it is not already there (It is possible to use fancier data structures than linked lists for this; but linked lists work very well in the average case, as we will see) Let’s look at analyzing time costs of these strategies

Source: http://cseweb.ucsd.edu/~kube/cls/100/Lectures/lec16/lec16-25.html

If the number of items that will be inserted in a hash table isn’t known when the table is created, chained hash table is preferable to open addressing.

Increasing the load factor(number of items/table size) causes major performance penalties in open addressed hash tables, but performance degrades only linearly in chained hash tables.

If you are dealing with low memory and want to reduce memory usage, go for open addressing. If you are not worried about memory and want speed, go for chained hash tables.

When in doubt, use chained hash tables. Adding more data than you anticipated won’t cause performance to slow to a crawl.

继续阅读：data-structures hashtable

Chained Hash Tables vs. Open-Addressed Hash Tables

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？