Unexpected results when looking at ASCII codes in C++
The bit of code below is extracting ASCII codes from characters. When I convert characters in the normal ASCII region I get the value I expect. When I convert £ and € from the extened region I get a load of 1's padding the INT that I'm storing the character in.
e.g. the output of the below is:
45 (ascii E as expected) FFFFFF80 (extended ascii € as expected but padded with ones)
It's not causing me an issue but I'm just wondering why this happens. Here's the code...
unsigned int asciichar[3];
string cTextToEncode = "E€";
for (unsigned int i = 0; i < cTextToEncode.length(); i++)
{
asciich开发者_开发知识库ar[i] = (unsigned int)cTextToEncode[i];
cout << hex << asciichar[i] << "\n";
}
Can anyone explain why this is? Thanks
depending on the implementation a char can be either signed or unsigned. In your case they appear to be signed, so 0x80 is interpreted as -128 instead of 128, hence when cast to an integer it becomes 0xffffff80.
btw, this has nothing at all to do with ASCII
First, there's no € in ASCII (extended or otherwise) because the euro didn't exist when ASCII was created. However, several ASCII-friendly 8-bit encodings do support the € character, but the conversion is done by your source code editor (the compiler merely sees a byte which happens to represent € in your editor, but might be something else entirely on, say, a computer in Israel).
Second, (unsigned int)
casts do not extract the ASCII encoding of a character. They merely convert the value of the underlying numeric char
type to an unsigned integer. This causes strange things to happen when the converted value is negative - on your compiler, char
happens to be signed char
and thus characters with an ASCII value larger than 127 end up being negative char
values.
You should convert to an unsigned char
first, and then to an unsigned int
.
You should be careful when promoting signed values.
When promoting signed char to signed int a first bit (sign bit) is taken into account. The algorithm is roughly look like this:
1) If you have 1X-XX-XX-XX
(char in binary, X - any binary digit) then int will be (starts with 24 ones) 1...1-1X-XX-XX-XX
(binary) -> 0xFFFFFFYY
(hex)
2) if you have 0X-XX-XX-XX
(binary), then you'll have (starts with 24 zeroes) 0...0-0X-XX-XX-XX
(binary) -> 0x000000YY
(hex).
In your case you want to force rule #2 all the time. In order to do this, you need to tell compiler to ignore first bit (sign bit). For this you need to use unsigned char.
精彩评论