merging hashtables into an array
I want to merge 2 or more hashtables together..It doesn't matter what the final form is, as long as I can iterate through it. Here the final form is an array.
So I have开发者_StackOverflow中文版 an unsigned long long as the key, the value is a string,int pair. Each key maps to a bin, each bin can have collisons. Instead of copying the entire hashtable into the array i copy it bin by bin, that way i will not need to iterate through the entire array. First I copy the first bin of the first hashtable into an array of Pairs, with the string and int as it's fields(the key is ignored)'
Something like
Class Pair{
char* s;
int frequency;
};
To add it to the array I would have something like this...
Pair pair
pair.s=string of the hashtable value
pair.s=integer of the hashtable value
array[index]=pair;
Then to merge the 1st bin of the 2nd hashtable into the array, I first check if the string of the value of the hashtable is already in the array, if it is I just update the int part of the class pair corresponding to the string that is in the array, if it isn't I add it to the array.
Then I go on to the next bin..copy the 2nd bin of the first hashtable to the array..then instead of iterating through the entire array to check something in the 2nd bin of the 2nd hashtable is in the array, I start searching from the array index where the first element of the 2nd bin was inserted into array.
The problem is even iterating that way is still pretty lengthy as each bin can contain 1000+ collisons and there are thousands of bins to go through.I want to avoid that. I was thinking since each key (which is a long long) is unique with each string, to set the offset at that key number to 1 if it is in the array, and 0 if it isn't. That way I only need to iterate through the array if it is in the array. The problem with that is a long long is simply too big. I can't allocate an array with that many bits...
Is there another way?
While copying the values from the first hash table, build a temporary hash table with the same keys but with the values being the array index where each was inserted. Then, when copying values from the second hash table, check if each key is in the temporary table and if it is, you know which array element to update right away (otherwise you just push the new value on the end).
Another approach, which would take less space but would mutate your input, would be to copy the second hash table over the first, then do copy that combined result to the array. This would naturally merge the two hash tables with no extra storage, but may not be so great if the hash tables will be used further on in the execution of your program.
精彩评论