Unexpected results when looking at ASCII codes in C++

2023-02-04 15:02 问答作者：

The bit of code below is extracting ASCII codes from characters. When I convert characters in the normal ASCII region I get the value I expect. When I convert £ and € from the extened region I get a load of 1's padding the INT that I'm storing the character in.

e.g. the output of the below is:

45 (ascii E as expected) FFFFFF80 (extended ascii € as expected but padded with ones)

It's not causing me an issue but I'm just wondering why this happens. Here's the code...

unsigned int asciichar[3];
    string cTextToEncode = "E€";
    for (unsigned int i = 0; i < cTextToEncode.length(); i++)
    {
        asciich开发者_开发知识库ar[i] = (unsigned int)cTextToEncode[i];
        cout << hex << asciichar[i] << "\n";    
    }

Can anyone explain why this is? Thanks

depending on the implementation a char can be either signed or unsigned. In your case they appear to be signed, so 0x80 is interpreted as -128 instead of 128, hence when cast to an integer it becomes 0xffffff80.

btw, this has nothing at all to do with ASCII

First, there's no € in ASCII (extended or otherwise) because the euro didn't exist when ASCII was created. However, several ASCII-friendly 8-bit encodings do support the € character, but the conversion is done by your source code editor (the compiler merely sees a byte which happens to represent € in your editor, but might be something else entirely on, say, a computer in Israel).

Second, (unsigned int) casts do not extract the ASCII encoding of a character. They merely convert the value of the underlying numeric char type to an unsigned integer. This causes strange things to happen when the converted value is negative - on your compiler, char happens to be signed char and thus characters with an ASCII value larger than 127 end up being negative char values.

You should convert to an unsigned char first, and then to an unsigned int.

You should be careful when promoting signed values.

When promoting signed char to signed int a first bit (sign bit) is taken into account. The algorithm is roughly look like this:

1) If you have 1X-XX-XX-XX (char in binary, X - any binary digit) then int will be (starts with 24 ones) 1...1-1X-XX-XX-XX (binary) -> 0xFFFFFFYY (hex)

2) if you have 0X-XX-XX-XX (binary), then you'll have (starts with 24 zeroes) 0...0-0X-XX-XX-XX (binary) -> 0x000000YY (hex).

In your case you want to force rule #2 all the time. In order to do this, you need to tell compiler to ignore first bit (sign bit). For this you need to use unsigned char.

继续阅读：casting char

Unexpected results when looking at ASCII codes in C++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？