arithmetic with double vs bit operations
There is some obvious stuff I feel I should understand h开发者_运维技巧ere, but I don:t:
void main()
{
long first = 0xffffffc1;
long second = 0x92009019;
//correct
__int64 correct = (((__int64)first << 32) | 0x00000000ffffffff) & (0xffffffff00000000 | second); //output is 0xffffffc192009019;
//incorrect
__int64 wrong = (double)(((__int64)first << 32) + second); //output is 0xffffffc092009019;
}
why does the add operation affect the upper 4 bytes, and how?
(compiler is VC++ 2003)
Probably because second
is signed, which mean that 0x92009019 is negative.
EDIT: The quesiton actually contains two questions.
1) How do you join two 32 bit numbers to a 64 bit value?
Answer:
(((uint64_t)first) << 32) | (uint32_t)second
2) Is it wise to do bit operations using the floating-point type double
?
Answer: No, it's not. Please use the right tool for the job. If you want to do bit operations, use integers. If you want (almost) continuous values, use floating-point values.
A long has 53 bits of precision. I'm quite surprised you got the last digits right. (The first wrong digit is explained by Lindydancer).
Edit: I'm no more surprised: as the result is negative you don't need only 38 bit of precision with your data. If you use
first = 0xffdfffc1;
you are loosing the lsb with the double solution.
精彩评论