Most optimized way to calculate modulus in C
I have minimize cost of calculating modulus in C. say I have a number x and n is the number which will divide x
when n == 65536 (which happens to be 2^16):
mod = x % n (11 assembly instructions as produced by GCC) or
mod = x & 0xffff which is equal to mod = x & 65535 (4 assembly instructions)so, GCC doesn't optimize it to this extent.
In my case n is not x^(int) but is largest prime less than 2^16 which is 65521
as I showed for n == 2^16, bit-wise operations can optimize the computation. What bit-w开发者_运维知识库ise operations can I preform when n == 65521 to calculate modulus.
First, make sure you're looking at optimized code before drawing conclusion about what GCC is producing (and make sure this particular expression really needs to be optimized). Finally - don't count instructions to draw your conclusions; it may be that an 11 instruction sequence might be expected to perform better than a shorter sequence that includes a div instruction.
Also, you can't conclude that because x mod 65536
can be calculated with a simple bit mask that any mod operation can be implemented that way. Consider how easy dividing by 10 in decimal is as opposed to dividing by an arbitrary number.
With all that out of the way, you may be able to use some of the 'magic number' techniques from Henry Warren's Hacker's Delight book:
- Archive of http://www.hackersdelight.org/
- Archive of http://www.hackersdelight.org/magic.htm
There was an added chapter on the website that contained "two methods of computing the remainder of division without computing the quotient!", which you may find of some use. The 1st technique applies only to a limited set of divisors, so it won't work for your particular instance. I haven't actually read the online chapter, so I don't know exactly how applicable the other technique might be for you.
x mod 65536 is only equivalent to x & 0xffff if x is unsigned - for signed x, it gives the wrong result for negative numbers. For unsigned x, gcc does indeed optimise x % 65536
to a bitwise and with 65535 (even on -O0, in my tests).
Because 65521 is not a power of 2, x mod 65521 can't be calculated so simply. gcc 4.3.2 on -O3 calculates it using x - (x / 65521) * 65521
; the integer division by a constant is done using integer multiplication by a related constant.
rIf you don't have to fully reduce your integers modulo 65521, then you can use the fact that 65521 is close to 2**16. I.e. if x is an unsigned int you want to reduce then you can do the following:
unsigned int low = x &0xffff;
unsigned int hi = (x >> 16);
x = low + 15 * hi;
This uses that 2**16 % 65521 == 15. Note that this is not a full reduction. I.e. starting with a 32-bit input, you only are guaranteed that the result is at most 20 bits and that it is of course congruent to the input modulo 65521.
This trick can be used in applications where there are many operations that have to be reduced modulo the same constant, and where intermediary results do not have to be the smallest element in its residue class.
E.g. one application is the implementation of Adler-32, which uses the modulus 65521. This hash function does a lot of operations modulo 65521. To implement it efficiently one would only do modular reductions after a carefully computed number of additions. A reduction shown as above is enough and only the computation of the hash will need a full modulo operation.
The bitwise operation only works well if the divisor is of the form 2^n
. In the general case, there is no such bit-wise operation.
If the constant with which you want to take the modulo is known at compile time and you have a decent compiler (e.g. gcc), tis usually best to let the compiler work its magic. Just declare the modulo const.
If you don't know the constant at compile time, but you are going to take - say - a billion modulos with the same number, then use this http://libdivide.com/
As an approach when we deal with powers of 2, can be considered this one (mostly C flavored):
.
.
#define THE_DIVISOR 0x8U; /* The modulo value (POWER OF 2). */
.
.
uint8 CheckIfModulo(const sint32 TheDividend)
{
uint8 RetVal = 1; /* TheDividend is not modulus THE_DIVISOR. */
if (0 == (TheDividend & (THE_DIVISOR - 1)))
{
/* code if modulo is satisfied */
RetVal = 0; /* TheDividend IS modulus THE_DIVISOR. */
}
else
{
/* code if modulo is NOT satisfied */
}
return RetVal;
}
If x
is an increasing index, and the increment i
is known to be less than n
(e.g. when iterating over a circular array of length n), avoid the modulus completely.
A loop going
x += i; if (x >= n) x -= n;
is way faster than
x = (x + i) % n;
which you unfortunately find in many text books...
If you really need an expression (e.g. because you are using it in a for
statement), you can use the ugly but efficient
x = x + (x+i < n ? i : i-n)
idiv — Integer Division
The idiv instruction divides the contents of the 64 bit integer EDX:EAX (constructed by viewing EDX as the most significant four bytes and EAX as the least significant four bytes) by the specified operand value. The quotient result of the division is stored into EAX, while the remainder is placed in EDX.
source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
精彩评论