Problem with hashing function - C
I am using the following hashing function provided in the K&R book.
#define HASHSIZE 101
unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
In my project, I have more warnings turned on (warnings are treated as errors too) and the above code will fail to compile.
error: conversion to ‘un开发者_高级运维signed int’ from ‘char’ may change the sign of the result
If I make the hashval
signed, I am getting negative hash values. I am wondering how this can be fixed.
Any help?
What your compiler is picking up on and warning you about is that you are implicitly changing your interpretation of the bytes stored in the area pointed to by s
. The function prototype specifies s
as being a pointer to a char
and by default on your setup, char
s seem to be signed. However, to get the has arithmetic correct, you need to use just unsigned values. So the question is this: what should the compiler do with values pointed to through s
which actually have negative values?
Let's take a quick diversion to make sure we understand what values we might be considering. The possible values for a signed char
are CHAR_MIN
to CHAR_MAX
inclusive. (These values can be found in limits.h
.) The possible values for an unsigned char
are 0
to UCHAR_MAX
inclusive. So the question becomes this: how do we represent the possible range of values from CHAR_MIN
to CHAR_MAX
within the range 0
to UCHAR_MAX
?
One simple approach is simply to let the compiler carry out this conversion for you: it simply uses wrap-around arithmetic to ensure that the value is within limits: it automatically adds UCHAR_MAX + 1
enough times to get a value which is within the range 0
to UCHAR_MAX
. However, the actual value of this will be potentially dependent on the compiler which you are using. It is this possibility of non-portability which lies behind your compiler warning.
OK, so where does that get us? Well, if you are prepared to take responsibility for the hypothetical portability problems which this approach will produce, you can tell the compiler that you are happy for it to make the conversion using the standard rules. You do this by using a cast:
hashval = ((unsigned char) *s) + 31 * hashval;
This approach will suppress the warning and ensure that your arithmetic is all done as unsigned, which is what you want for this sort of has function. However, you need to be aware that the same code on other systems may give different hash results.
An alternative approach is to use the fact that the ANSI C standard specifies that pointers can validly be cast to type unsigned char *
to access the underlying byte structure of the data being pointed to. (I don't have my copy of the standard to hand at the moment, or I'd give you a reference.) This would allow you to generalise this approach to producing a function which gives you a hash of a value of any data type. (However, to do this you must think about how you know the size of the data being passed in.) This might look something like:
unsigned hash(void *s, size_t n) {
unsigned char *t = (unsigned char *) s;
while (n--)
hashval = (*(t++) + 31 * hashval) % HASHSIZE;
return hashval;
}
I hope this gives you a bit of insight into what's going on.
Change s
to be unsigned char *
in the function signature, or simply cast when you use it (i.e. (unsigned char *)s
).
I think you can safely typecast your char to unsigned: (unsigned char)*s
精彩评论