Looking for production quality Hash table/ unordered map implementation to learn?
- Looking for good source code either in C or C开发者_运维知识库++ or Python to understand how a hash function is implemented and also how a hash table is implemented using it.
- Very good material on how hash fn and hash table implementation works.
Thanks in advance.
Hashtables are central to Python, both as the 'dict' type and for the implementation of classes and namespaces, so the implementation has been refined and optimised over the years. You can see the C source for the dict object here.
Each Python type implements its own hash function - browse the source for the other objects to see their implementations.
When you want to learn, I suggest you look at the Java implementation of java.util.HashMap
. It's clear code, well-documented and comparably short. Admitted, it's neither C, nor C++, nor Python, but you probably don't want to read the GNU libc++'s upcoming implementation of a hashtable, which above all consists of the complexity of the C++ standard template library.
To begin with, you should read the definition of the java.util.Map
interface. Then you can jump directly into the details of the java.util.HashMap
. And everything that's missing you will find in java.util.AbstractMap
.
The implementation of a good hash function is independent of the programming language. The basic task of it is to map an arbitrarily large value set onto a small value set (usually some kind of integer type), so that the resulting values are evenly distributed.
There is a problem with your question: there are as many types of hash map as there are uses.
There are many strategies to deal with hash collision and reallocation, depending on the constraints you have. You may find an average solution, of course, that will mostly fit, but if I were you I would look at wikipedia (like Dennis suggested) to have an idea of the various implementations subtleties.
As I said, you can mostly think of the strategies in two ways:
- Handling Hash Collision: Bucket, which kind ? Open Addressing ? Double Hash ? ...
- Reallocation: freeze the map or amortized linear ?
Also, do you want baked in multi-threading support ? Using atomic
operations it's possible to get lock-free multithreaded hashmaps as has been proven in Java by Cliff Click (Google Tech Talk)
As you can see, there is no one size fits them all. I would consider learning the principles first, then going down to the implementation details.
- C++
std::unordered_map
use a linked-list bucket and freeze the map strategies, no concern is given to proper synchronization as usual with the STL. - Python
dict
is the base of the language, I don't know of the strategies they elected
精彩评论