Design a Hashtable
I was asked this question in an Interview and was left stumped, ev开发者_Go百科en though i came up with an answer I didn't feel comfortable with my solution. I wanted to see how experts here feel about this question.
I am exactly quoting the question as it came out of the Interviewer. "Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". Finally he said It's more like simulating a Hash-table via another Data-structure.
Can anyone light me with more information on this question. Thanks!
PS: Main reason for me putting this question is to know how an expert designer would start off with the Design for this problem && one more thing I cleared the interview somehow based on the other questions that were asked but this question was in my mind and I wanted to find out the answer!
It's a fairly standard interview question that shows you understand the underlying concepts being useful Java data structures, like HashSets and HashMaps.
You would use an array of lists, these are normally referred to as buckets. You start your hashtable with a given capacity n meaning you have a array of 10 lists (all empty).
To add an object to your hastable you call the objects hashCode function which gives you an int (a number in a pretty big range). So you then have to modulo the hashCode wrt to n to give you the bucket it lives in. Add the object to the end of the list in that bucket.
To find an object you again use the hashCode and mod function to find the bucket and then need to iterate through the list using .equals() to find the correct object.
As the table gets fuller, you will end up doing more and more linear searching, so you will eventually need to re-hash. This means building an entirely new, larger table and putting the objects into it again.
Instead of using a List in each array position you can recalulate a different bucket position if the one you want is full, a common method is quadratic probing. This has the advantage of not needed any dynamic data structures like lists but is more complicated.
You need an array of lists, or "buckets" for your values. Then you use a hash function to determine which array element to look in, and finally do a linear search through the list elements there.
You have constant lookup of the array location, and linear search of the hash values in the small list there.
Use an array => O(1)
So you'd use a hash function to turn your key into a number, then use that number as an index into an array to retrieve the value.
If I would have been in your shoes, I should have done the following:
- Discuss on what exactly hashtable is and in what situations it should be used.
- Discuss one of the implementations (for e.g. .net framework implementation of it) from the consumer perspective of it.
- Discuss 'How HashTable functions internally' with the interviewer. This is very important. You will be able to design it only if you know on how hashtable works.
- Break the problem: a.Choice of Data Structure b.Choice of Hashing Function
- Use TDD (Test Driven Development) to design and implement HashTable class. Only implement the functionality that you were asked for.
Consider the Universe U (e.g. all possible IP address, or all possible names or all possible mobile numbers or all possible chess board configuration). You might have noticed that universe U is very large.
Set S is of reasonable size S⊆ U. So, this set S is of reasonable size, like you keeping phone number of your friends.
Selecting data-structure for implementation Without data-structure, we will not get good solution. We could use an array for quick insert, deletion, and lookup, but it will take up a lot of space,as the size of universe is very large. Also, your friend name should be integer and space requirement is proportional to universe.
On the other hand, we could use a linked list. This would only take up as much as space as there are objects i.e. Set S, but the 3 operations would not be O(1). To resolve this, we can use both.
So, the solution is to use the best of both worlds, i.e. fast lookup of arrays and small storage of size like link list.
But, these real world entities needs to be changed to integer, by something called hash function, so that they can be used as array index. So, suppose you want to save your friend's name alice, just convert his name to integer
Inserting alice:
int k = hashFunc(alice);
arr[k] = Alice //this takes O(1) time
Lookup for alice:
int k = hashFunc(alice);
string name = arr[k] ;
print name;//prints alice 
Of-course it is not that simple, but this is what I can explain right now. Please let me know wherever I am not clear.Thanks. For more information on hash table refer here
A hash table provides a way to insert and retrieve data efficiently (usually in constant/O(1)) time. For this we use an very large array to store the the target values and a hash function which usually maps the target values, into hash values which is nothing else but the valid indices in this large array. A hash function which perfectly hashes a values to be stored into a unique key (or index in the table) is known as a perfect hash function. But in practice to store such values for which there is no known way to obtain unique hash values (indices in the table) we usually use a hash function which can map each value to particular index so that collision can be kept minimum. Here collision means that two or more items to be stored in the hash table map to the same hash value.
Now coming at the original questions, which is: "Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". Finally he said It's more like simulating a Hash-table via another Data-structure."
Look up is possible in exactly O(1) time, in case we can design a perfect hash function. The underlying data-structure is still an array. But it depends upon the values to be stored, whether we can design a perfect hash function or not. For example consider strings to English alphabet. Since there is no known hash function which can map each valid English word to a unique int (32 bit) (or long long int 64 bit) value, so there will always be some collisions. To deal with collision we can use separate chaining method of collision handling in which each hash table slot stores a pointer to the linked list, which actually stores all the item hashing to that particular slot or index. For example consider a hash function which considers each English alphabet string as a number on base 26 (because there are 26 characters in English alphabet), This can be coded as:
unsigned int hash(const std::string& word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    unsigned int key=0;
    for(int i=0;i<word.length();++i)
    {
         key = (key<<4) + (key<<3)+(key<<2) + word[i];
         key = key% tableSize;
    }
    return key;
}
Where tableSize is an appropriately chosen prime number just greater than the total number of English dictionary words targeted to be stored in the hash table.
Following are the results with dictionary of size 144554, and table of size = 144563:
[Items mapping to same cell --> Number of such slots in the hash table ] =======>
[ 0  -->   53278 ]
[1 --> 52962 ]
[2 --> 26833 ]
[3 --> 8653  ]
[4 --> 2313 ]
[5 --> 437 ]
[6  --> 78 ]
[7  -->  9 ]
In this case to search the items which have been mapped to cells containing only one item, the lookup will be O(1), but in case it maps to a cell which has more than 1 items, then we have to iterate through this linked list which might contain 2 to 7 nodes and then we will be able to find out that element. So its not constant in this case.
So it depends upon the availability of perfect hash function only, whether we the lookup can be performed in O(1) constraint. Otherwise it will not be exactly O(1) but very close to it.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论