Explanation required for BITCOUNT macro
Can someone explain how this works?
#define BX_(x) ((x) - (((x)>>1)&0x77777777) \
- (((x)>>2)&0x33333333) \
- (((x)>>3)&0x11111111))
#define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
Clarification:
Ideally, the answer will start something along the lines of:
The macro: "BX_" subtracts three values from the passed in number.
These three values represent:
- XXXXX
- YYYYY
- ZZZZZ
This allows the BITCOUNT() t开发者_StackOverflowo work as follows...
Cheers,
David
The output of BX_(x) is the number of on bits in each hex digit. So
BX_(0x0123457F) = 0x01121234
The following:
((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F)
shuffles the counts into bytes:
((BX_(0x0123457F)+(BX_(0x0123457F)>>4)) & 0x0F0F0F0F) = 0x01030307
Taking this result modulo 255 adds up the individual bytes to arrive at the correct answer 14. To see that this works, consider just a two-byte integer, 256*X + Y. This is just 255*X + X + Y, and 255*X % 255 is always zero, so
(256*X + Y) % 255 = (X + Y) % 255.
This extends to four-byte integers:
256^3*V + 256^2*W + 256*X + Y
Just replace each 256 with (255+1) to see that
(256^3*V + 256^2*W + 256*X + Y) % 255 = (V + W + X + Y) % 255.
The final observation (which I swept under the rug with the 2-digit example) is that V + W + X + Y
is always less than 255, so
(V + W + X + Y) % 255 = V + W + X + Y.
As quoted by Johannes from that splendid Bit Twiddling Hacks page, there's an excellent and detailed description of that algorithm in Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors from AMD on page numbers 179 and 180 - corresponding to pages 195 and 196 of the PDF.
Also describing the same idea and some alternative solutions and their relative performance: this page.
精彩评论