arithmetic with double vs bit operations

2023-02-26 22:20 问答作者：

There is some obvious stuff I feel I should understand h开发者_运维技巧ere, but I don:t:

void main()
{
    long first = 0xffffffc1;
    long second = 0x92009019;

    //correct
    __int64 correct = (((__int64)first << 32) | 0x00000000ffffffff) & (0xffffffff00000000 | second); //output is 0xffffffc192009019;

    //incorrect
    __int64 wrong = (double)(((__int64)first << 32) + second); //output is 0xffffffc092009019;
}

why does the add operation affect the upper 4 bytes, and how?

(compiler is VC++ 2003)

Probably because second is signed, which mean that 0x92009019 is negative.

EDIT: The quesiton actually contains two questions.

1) How do you join two 32 bit numbers to a 64 bit value?

Answer:

(((uint64_t)first) << 32) | (uint32_t)second

2) Is it wise to do bit operations using the floating-point type double?

Answer: No, it's not. Please use the right tool for the job. If you want to do bit operations, use integers. If you want (almost) continuous values, use floating-point values.

A long has 53 bits of precision. I'm quite surprised you got the last digits right. (The first wrong digit is explained by Lindydancer).

Edit: I'm no more surprised: as the result is negative you don't need only 38 bit of precision with your data. If you use

first = 0xffdfffc1;

you are loosing the lsb with the double solution.

继续阅读：bit-manipulation math

arithmetic with double vs bit operations

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？