usual hashtable implementation compared to tree
Usually (as in C++), the hash function returns any size_t value -- thus many different hash values are possible (2^32).
That is why 开发者_开发问答I always thought that when people talked about them being implemented as tables, that is not really true in practice because the table would be much too big (2^32 entries). Of course my assumption was wrong that the table must be as big as the full range of hash values.
It seems that the actual implementation is more difficult than I thought. What I always had in mind, what would be the naive implementation, is something like:
typedef list< pair<Key,Value> > Bucket;
typedef map< size_t, Bucket > Hashtable;
Now my question: How does this naive implementation differs from actual implementations in the praxis in terms of complexity (runtime and memory)?
Note that there are other ways to implement hash tables as Matthieu M points out. The remainder of this answer assumes that you want to use hashing with buckets made of some kind of list.
Assuming you are talking about time complexity.
Hash tables are expected to have O(1) best-case access.  Your proposal for implementation in the question uses a map<size_t,Bucket> for access to the buckets which would result in O(log n) time complexity.  You need something with O(1) access time complexity, such as a vector<Bucket> to comply with the expected time complexity of a hash table.
More details
Hash tables can vary between having excellent and poor time complexity, depending on how sparsely populated they are.
In the best case, each bucket has at most one entry, and access by key is O(1). This is the commonly quoted complexity for hash tables.
In the worst case, each key has the same hash value, and access by key is effectively searching a list which results in O(n) behaviour.
Real-world use is usually somewhere between these extremes, hopefully closer towards O(1).
The accepted answer to your other question has some simplified code you can use to work through these two extremes to satisfy yourself that this is the case.
The very issue with your question Albert, is that there is not ONE hash table, there are many of them.
The heart of the problem here is the big-O complexity given for some operations. In average a hash-table should yield O(1) complexity to find an item. A binary tree yields O(log N) in average.
In term of speed, it really depends on the size of N, because those are asymptotic complexities, hence they represent order of magnitude when N is big (think million) and the real speed may be much different for small collections.
So, instead of trying to elaborate more on your question, I think you should get a better grasp of hash tables. A quick overview:
- Hash tables may or may not be implemented in term of buckets: non-bucket implementation include open-addressing schemes (which are more cache-friendly by the way).
- Buckets may or may not be implemented in term of a linked-list. Other schemes include using another hash function (each bucket being a hash-table itself) or a binary tree (map), the latter requires some ordering though.
- Reallocation may be done at once: ie once you exceed the capacity a new (bigger) hash-table is allocated and all the content is copied or use a linear reallocation scheme to smooth the reallocating cost and avoid a big hit from time to time.
Read the article on Wikipedia, it addresses these points and more.
It depends. If the hash function does a good job of implementing an even distribution of hash keys and the table isn't too full then you will get approximately O(1). The hash table will get the correct hit with relatively few collisions.
If the table is extensively chained (i.e. full) then the probing process will spend more time resolving colissions. In the theoretical worst case all values will map to the same hash key and the hashing function will spend all of its time tracing down the chain, which is O(n).
In practice, unless your hash function is really broken you should get O(1) for all practical purposes (note that you can take a modulus of a larger hash value for smaller tables). If you have a hash-table based container that can expand, then it may do an expand operation, which will be substantially more expensive(1).
A tree will be O(log n) rather than O(1), although if the tree is unbalanced then the search can turn into an effectively linear operation as well. Note that this is a problem in some common scenarios, such as when nodes are inserted in key order (imagine a shallow copy operation on a tree based collection). Typically, balanced tree algorithms such as red-black trees are used to keep the tree efficient. Another advantage of trees is that one can traverse the tree in order and produce an ordered list of keys without having to explicitly sort them.
- For an example of a hashing operation with relatively cheap expansion costs see Linear Hash Table (wi.ipedia.org).
An implemenation can easily reduce an "oversize" haskey. For instance, it could use the following structure:
typedef list< pair<Key,Value> > Bucket;
const int HashSize = 511;
Bucket[HashSize] Hashtable;
inline size_t HashIndex(Key k) { return hash(k) % HashSize; }
In practice, of course, HashSize isn't a constant. That would cause the performance to drop drastically if you insert more than a few thousand elements. Also, it uses quite a bit of memory if there are fewer elements. Hence, implementations do smart things with that internal parameter. As a result, the number of values per bucket is O(1), and finding the right bucket is also O(1). This is how such an implementation can retrieve any value in O(1).
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论