开发者

Explanation required for BITCOUNT macro

Can someone explain how this works?

#define  BX_(x)         ((x) - (((x)>>1)&0x77777777)                    \
                             - (((x)>>2)&0x33333333)                    \
                             - (((x)>>3)&0x11111111))


#define BITCOUNT(x)     (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)

Clarification:

Ideally, the answer will start something along the lines of:

The macro: "BX_" subtracts three values from the passed in number.

These three values represent:

  1. XXXXX
  2. YYYYY
  3. ZZZZZ

This allows the BITCOUNT() t开发者_StackOverflowo work as follows...

Cheers,

David


The output of BX_(x) is the number of on bits in each hex digit. So

BX_(0x0123457F) = 0x01121234

The following:

((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F)

shuffles the counts into bytes:

((BX_(0x0123457F)+(BX_(0x0123457F)>>4)) & 0x0F0F0F0F) = 0x01030307

Taking this result modulo 255 adds up the individual bytes to arrive at the correct answer 14. To see that this works, consider just a two-byte integer, 256*X + Y. This is just 255*X + X + Y, and 255*X % 255 is always zero, so

(256*X + Y) % 255 = (X + Y) % 255.

This extends to four-byte integers:

256^3*V + 256^2*W + 256*X + Y

Just replace each 256 with (255+1) to see that

(256^3*V + 256^2*W + 256*X + Y) % 255 = (V + W + X + Y) % 255.

The final observation (which I swept under the rug with the 2-digit example) is that V + W + X + Y is always less than 255, so

(V + W + X + Y) % 255 = V + W + X + Y.


As quoted by Johannes from that splendid Bit Twiddling Hacks page, there's an excellent and detailed description of that algorithm in Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors from AMD on page numbers 179 and 180 - corresponding to pages 195 and 196 of the PDF.

Also describing the same idea and some alternative solutions and their relative performance: this page.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜