A few questions about my modular code using void* as dynamic data type in C
A few days ago I posted this question and everyone suggested me to use void*
, which I did. I think some of them also pointed a few things that I would need to take care of but I'm not sure what exactly were they. However, I'm having a few problems with this...
I'm not going to post all my code where cause it's quite large, instead, I'll post the things that I think are important and hopefully are enough for you to help me out.
My hash table structure is like this:
typedef void * HashKey;
typedef void * HashValue;
typedef struct sHashItem {
HashKey key;
HashValue value;
char status;
} HashItem;
typedef struct sHashTable {
HashItem *items;
int count;
float load;
int size;
Bool (*compare)(HashKey, HashKey);
unsigned (*hash)(void *);
} HashTable;
The signature for my insert function goes like this:
Bool hashInsert(HashTable * const table, HashKey key, HashValue value)开发者_JAVA技巧;
And somewhere inside that function, when I find a free bucket in the hash table, I do this:
table->items[index].key = key;
table->items[index].value = value;
table->items[index].status = USED;
table->load = ++table->count / (float)table->size;
All this presents a few problems:
1) As you can see above I'm simply setting each key/value pair of the free bucket to the same pointer passed as the key/value hashInsert
function arguments. This presents a problem as you may have already noticed... For instance, doing something like this:
char str[50];
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)5);
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)3);
And if the input is "KeyA" and then "KeyB", both will have "KeyB" as the buckets keys. The same thing applies to the value and not just the key since they are basically the same type because I want to have my code fully modular, for any data type.
How could I solve this?
My first though is to use strdup(str)
and pass that to the hashInsert
function. That would solve the problem. And since this was handled in the main code, I could easily use malloc()
too for any other data type I need to pass as the value (the key will probably always be a string or an int).
But this solution presents another problem...
2) How should I free this allocated memory? Sure, it was allocated by the "main programmer" and not the "hash table module programmer" so, the "main programmer" should free it in the main code, right? However, that doesn't look much like modular code to me.
My code also has a hashDestroy
function, to free all the allocated memory. But how can I use this function to free everything? I can't just iterate over every key/value and use free()
on them cause maybe some of them weren't malloc'd
by any programmer in the first place and I don't need to free them.
How can I find out which ones my hashDestroy
must free and which ones it shouldn't?
3) To finish, I guess I can also throw this issue into the mix... In point one, my suggestion was to use strdup()
or malloc
to "fix" that specific problem (while introducing another) but that also don't look very modular to me. This memory allocation should be done in the hash table module code and not in the main code by the "main programmer".
How do you suggest I solve this? I mean, the data types can be anything and while the use of strdup()
helps a lot, it only works for strings. What if I need to allocate memory for some specific structure or just an int?
Sorry for the big post but I think these questions are all related and I need some help figuring them out since my C knowledge is not that extreme. I only recently learned about void*
so...
Wow: this is going to take some answering in full. However, one of the key things you're going to need is the size of whatever it is you are processing - it is fine to use a void pointer, but you need to know how big the object whose address you're receiving is.
[...] everyone suggested me to use void*, which I did. [...]
My hash table structure is like this:
typedef void * HashKey;
typedef void * HashValue;
typedef struct sHashItem {
HashKey key;
HashValue value;
char status;
} HashItem;
typedef struct sHashTable {
HashItem *items;
int count;
float load;
int size;
Bool (*compare)(HashKey, HashKey);
unsigned (*hash)(void *);
} HashTable;
You are likely to need a size_t key_sz;
and a size_t val_sz;
member in HashItem
.
Your hash function pointer will need to know how big the key to be hashed is.
I'm in two minds about what the HashKey should be. It depends in part on how you are using this stuff. It looks like you want:
- Given this key value of my choosing,
- Store/return this data that is associated with it.
In which case, you probably also need to store the hash number somewhere in the HashItem
; that is the value returned by your hashing function - apparently an unsigned integer. I'm not sure what the signature on the compare
function (function pointer) should be; I'm suspicious that it should either take a pair of HashKey-and-size values, or possibly a pair of HashItem pointers.
The signature for my insert function goes like this:
Bool hashInsert(HashTable * const table, HashKey key, HashValue value);
And somewhere inside that function, when I find a free bucket in the hash table, I do this:
table->items[index].key = key;
table->items[index].value = value;
table->items[index].status = USED;
table->load = ++table->count / (float)table->size;
All this presents a few problems:
1) As you can see above I'm simply setting each key/value pair of the free bucket to the same pointer passed as the key/value hashInsert function arguments. This presents a problem as you may have already noticed... For instance, doing something like this:
char str[50];
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)5);
scanf("%s%*c", str);
hashInsert(t1, (HashKey)str, (HashValue)3);
The key to using void *
is to pass addresses around. Casting should be unnecessary in C.
You also need to pass the size of things. Hence:
Bool hashInsert(HashTable * const table, HashKey key, size_t key_sz,
HashValue value, size_t val_sz);
char str[50];
scanf("%s%*c", str);
int value = 5;
hashInsert(t1, str, strlen(str)+1, &value, sizeof(value));
Internally, you will copy the data - not using 'strdup()' since you don't know that there aren't interior NUL '\0' bytes in it.
And if the input is "KeyA" and then "KeyB", both will have "KeyB" as the buckets keys. The same thing applies to the value and not just the key since they are basically the same type because I want to have my code fully modular, for any data type.
How could I solve this?
You have to define who owns what, and whether (and how) the container copies data. In C++, the containers make a copy of whatever they are storing.
My first thought is to use strdup(str) and pass that to the hashInsert function. That would solve the problem. And since this was handled in the main code, I could easily use malloc() too for any other data type I need to pass as the value (the key will probably always be a string or an int).
You can't use 'strdup()' because in general, neither the values nor the keys are strings. If they are always strings, why are you using 'void *' instead of 'char *'?
You can decide to copy the value and the key - as long as you know the sizes.
But this solution presents another problem...
2) How should I free this allocated memory? Sure, it was allocated by the "main programmer" and not the "hash table module programmer" so, the "main programmer" should free it in the main code, right? However, that doesn't look much like modular code to me.
My code also has a hashDestroy function, to free all the allocated memory. But how can I use this function to free everything? I can't just iterate over every key/value and use free() on them cause maybe some of them weren't malloc'd by any programmer in the first place and I don't need to free them.
How can I find out which ones my hashDestroy must free and which ones it shouldn't?
You can't. You have to define a policy and only if that policy allows you to do the destruction should you do it. If you copy everything, you have an easy time. If you copy nothing, you have a different easy time (arguably easier), but your consumers have a hell of a time because they need a structure to keep track of what they need to release - perhaps a hashed list...
3) To finish, I guess I can also throw this issue into the mix... In point one, my suggestion was to use strdup() or malloc to "fix" that specific problem (while introducing another) but that also don't look very modular to me. This memory allocation should be done in the hash table module code and not in the main code by the "main programmer".
Yes...that's basically my recommendation.
How do you suggest I solve this? I mean, the data types can be anything and while the use of strdup() helps a lot, it only works for strings. What if I need to allocate memory for some specific structure or just an int?
Note that copying only does shallow copies. If the structures you're copying contain pointers, then the duplication code will only copy the pointer and not the pointed at data.
So, a general solution will require some sort of copy function. You may have to demand that the user gives you a 'release' function that frees the memory in an item. You may need to have the user provide you with already fully allocated data. You need to think about who owns what the search function returns - is it still 'in' the hash table or has it been removed. Look hard at the C++ STL system - it generally speaking does an excellent job and modelling your requirements on what it requires may make sense. But remember, C++ has constructors and destructors to help it.
I would malloc all of the data, and allow the client to the hash functions register a item_free()
function at hash table init time. That way it's up to the "main programmer" how to handle it.
Hmmm, from what I see in your example the problem isn't hashtable collisions (though you seem to have this problem as well), it is how to manage the memory of the items stored in the table. I think the standard way of doing this sort of thing IS to force the user of the data structure (the hashtable) to do the work of allocating the space for all the stuff that is going to be put in the table. The hashtable should only have to worry about the pointers. Suppose you do do an allocation then copy in the data structure: how would the user know how to deallocate the memory when the item is removed from the hastable?
There are two general solutions for dealing with collisions in a hashtable:
- Use the next free bucket instead.
- A bucket stores a linked list so multiple items can be stored in the same bucket.
With either of those, the issue of when to free what never arises, since all kinds of data are allocated either by the hash table or by the hash table's client. If you're still curious, the short answer to that dilemma is to use smart pointers.
To implement a hash table, we need a set of buckets. And since multiple elements can hash to the same bucket, each bucket needs a linked list.
Does
HashItem *items;
perform the second requirement above?
From your explanation, its not clear if it does.
For an excellent example, see K&R section 6.6. link where name = HashKey and defn = HashValue. alt text http://www.goldfish.org/books/The%20C%20Programming%20Language%20-%20K&R/pic64.gif
精彩评论