Fast modulo 3 or division algorithm?
is there a fast algorithm, similar to power of 2, which can be used with 3, i.e. n%3. Perhaps something that uses the fact that if sum of digits is divisibl开发者_JS百科e by three, then the number is also divisible.
This leads to a next question. What is the fast way to add digits in a number? I.e. 37 -> 3 +7 -> 10 I am looking for something that does not have conditionals as those tend to inhibit vectorization
thanks
4 % 3 == 1
, so (4^k * a + b) % 3 == (a + b) % 3
. You can use this fact to evaluate x%3 for a 32-bit x:
x = (x >> 16) + (x & 0xffff);
x = (x >> 10) + (x & 0x3ff);
x = (x >> 6) + (x & 0x3f);
x = (x >> 4) + (x & 0xf);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
x = (x >> 2) + (x & 0x3);
if (x == 3) x = 0;
(Untested - you might need a few more reductions.) Is this faster than your hardware can do x%3? If it is, it probably isn't by much.
This comp.compilers item has a specific recommendation for computing modulo 3.
An alternative, especially if the maximium size of the dividend is modest, is to multiply by the reciprocal of 3 as a fixed-point value, with enough bits of precision to handle the maximum size dividend to compute the quotient, and then subtract 3*quotient from the the dividend to get the remainder. All of these multiplies can be implemented with a fixed sequence of shifts-and-adds. The number of instructions will depend on the bit pattern of the reciprocal. This works pretty well when the dividend max is modest in size.
Regarding adding digits in the number... if you want to add the decimal digits, you're going to end up doing what amounts to a number-conversion-to-decimal, which involves divide by 10 somewhere. If you're willing to settle for adding up the digits in base2, you can do this with an easy shift-right and add loop. Various clever tricks can be used to do this in chunks of N bits to speed it up further.
Not sure for your first question, but for your second, you can take advantage of the %
operator and integer division:
int num = 12345;
int sum = 0;
while (num) {
sum += num % 10;
num /= 10;
}
This works because 12345 % 10 = 5
, 12345 / 10 = 1234
and keep going until num == 0
If you are happy with 1 byte integer division, here's a trick. You could extend it to 2 bytes, 4 bytes, etc.
Division is essentially multiplication by 0.3333. If you want to simulate floating point arithmetic then you need closest approximation for the 256 (decimal) boundary. This is 85, because 85 / 256 = 0.332. So if you multiply your value by 85, you should be getting a value close to the result in the high 8 bits.
Multiplying a value with 85 fast is easy. n * 85 = n * 64 + n * 16 + n * 4 + n. Now all these factors are powers of 2 so you can calculate n * 4 by shifting, then use this value to calculate n * 16, etc. So you have max 5 shifts and 4 additions.
As said, this'll give you approximation. To know how good it is you'll need to check the lower byte of the next value using this rule
n ... is the 16 bit number you want to divide
approx = HI(n*85)
if LO(n*85)>LO((n+1)*85)THEN approx++
And that should do the trick.
Example 1:
3 / 3 =?
3 * 85 = 00000000 11111111 (approx=0)
4 * 85 = 00000001 01010100 (LO(3*85)>LO(4*85)=>approx=1)
result approx=1
Example 2:
254 / 3
254 * 85 = 01010100 01010110 (approx=84)
255 * 85 = 01010100 10101011 (LO(254*85)<LO(255*85), don't increase)
result approx=84
If you're dealing with big-integers, one very fast method is realizing the fact for all
bases 10 +/- multiple-of-3
i.e.
4,7,10,13,16,19,22…. etc
All you have to do is count the digits, then % 3
. something like :
** note : x ^ y is power, not bit-wise XOR,
x ** y being the python equivalent
function mod3(__,_) {
#
# can handle bases
# { 4, 7,10,13,16,19,
# 22,25,28,31,34 } w/o conversion
#
# assuming base digits :
#
# 0-9A-X for any base,
# or 0-9a-f for base-16
return \
(length(__)<=+((_+=++_+_)+_^_)\
&& (__~"^[0-9]+$") )\
? (substr(__,_~_,_+_*_+_)+\
substr(__,++_*_--))%+_\
:\
(substr("","",gsub(\
"[_\3-0369-=CFILORUXcf-~]+","",__))\
+ length(__) \
+ gsub("[258BbEeHKNQTW]","",__))%+_
}
This isn't the fastest method possible, but it's one of the more agile methods.
精彩评论