开发者

hash table question

When an interviewer asked me what were disadvantage of hash table. He hinted me that hash table takes a lot of space from initialization. That means, we need to pre-allocate memory for hash table (bucket). Even we don't actually need that much memory, we d开发者_高级运维on't have that many entries.

Is this reasonable?

Coz I checked wikipedia, no this disadvantage discussed in the article.

Thanks!


It depends on the implementation. One way to implement a hash table is to make the initial table not so big, and if the load factor (ratio of used elements vs available slots) increases beyond a threshold, increase the table size (there are a few ways of doing this, all detailed in that wikipedia article you discussed).

The situation you mention could certainly be possible given some conditions (big initial table size, very few elements inserted), but it would most likely be the result of a poor data structure choice.


Depending on how you implement the hash table and how many buckets there are initially, it could be a reasonable disadvantage. Hash tables need about half (or more) of the buckets to be empty or else collisions become much more likely. All of the buckets are empty initially, but figure that after adding items to the hash table, most implementations will expand the number of buckets so at least half are free. This means you have O(n) empty buckets. Whether or not this matters depends on how many items you have and how big the buckets are. If the buckets are structs, they could potentially be quite large as they would need to store a hash value along pointers to the key and value (if not the actual key and value). More commonly, the buckets are pointers to containers that store the hash and pointers to the key and value. The size of each bucket then depends on the size of a pointer. This would almost always be 32- or 64-bits (unless you're using an embedded processor).

So assuming the best case of 4 bytes per bucket, you'd end up using 4 megabytes of memory for a hash table with 500,000 objects (remember: around half of the buckets are empty). Also figure each of those half million used buckets has a node with pointers to the actual data. That would use another 12-bytes per value (although with memory alignment constraints, it's more like 16 bytes). That would be another 8MB not including any of the actual data!

On the other hand, most data structures have a large memory overhead. A binary search tree has four pointers per node (one for the key, one for the value, and two for child nodes). At 16-bytes per node on a 32-bit system, this is comparable to that of a hash table (at least within an order of magnitude).

If all you are storing is chars, the overhead of any of these data structures could be large compared to the actual data, but in practice, it shouldn't be too much of a problem unless working with gigantic data sets and horribly inefficient hash table implementations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜