开发者

arithmetic with double vs bit operations

There is some obvious stuff I feel I should understand h开发者_运维技巧ere, but I don:t:

void main()
{
    long first = 0xffffffc1;
    long second = 0x92009019;

    //correct
    __int64 correct = (((__int64)first << 32) | 0x00000000ffffffff) & (0xffffffff00000000 | second); //output is 0xffffffc192009019;

    //incorrect
    __int64 wrong = (double)(((__int64)first << 32) + second); //output is 0xffffffc092009019;
}

why does the add operation affect the upper 4 bytes, and how?

(compiler is VC++ 2003)


Probably because second is signed, which mean that 0x92009019 is negative.

EDIT: The quesiton actually contains two questions.

1) How do you join two 32 bit numbers to a 64 bit value?

Answer:

(((uint64_t)first) << 32) | (uint32_t)second

2) Is it wise to do bit operations using the floating-point type double?

Answer: No, it's not. Please use the right tool for the job. If you want to do bit operations, use integers. If you want (almost) continuous values, use floating-point values.


A long has 53 bits of precision. I'm quite surprised you got the last digits right. (The first wrong digit is explained by Lindydancer).

Edit: I'm no more surprised: as the result is negative you don't need only 38 bit of precision with your data. If you use

first = 0xffdfffc1;

you are loosing the lsb with the double solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜