How to make my extended range floating point multiply more efficient?

2023-03-10 01:19 问答作者：

I am doing a calculation which frequently involves values like 3.47493E+17298. This is way beyond what a double can handle, and I don't need extra precision, just extra range of exponents, so I created my own little struct in C#.

My struct uses a long for significand and sign, and an int for exponent, so I effectively have:

1 sign bit 32 exponent bits (regular 2's complement exponent) 63 significand bits

I am curious what steps could be made to make my multiplication routine more efficient. I am running an enormous number of multiplications of these extended range values, and it is pretty fast, but I was looking for hints as to making it faster.

My multiplication routine:

    public static BigFloat Multiply(BigFloat left, 开发者_如何学GoBigFloat right)
    {
        long shsign1;
        long shsign2;

        if (left.significand == 0)
        {
            return bigZero;
        }

        if (right.significand == 0)
        {
            return bigZero;
        }

        shsign1 = left.significand;
        shsign2 = right.significand;

        // scaling down significand to prevent overflow multiply

        // s1 and s2 indicate how much the left and right 
        // significands need shifting.
        // The multLimit is a long constant indicating the
        // max value I want either significand to be
        int s1 = qshift(shsign1, multLimit);
        int s2 = qshift(shsign2, multLimit);

        shsign1 >>= s1;
        shsign2 >>= s2;

        BigFloat r;

        r.significand = shsign1 * shsign2;
        r.exponent = left.exponent + right.exponent + s1 + s2;

        return r;
    }

And the qshift:

It just finds out how much to shift the val to make it smaller in absolute value than the limit.

    public static int qshift(long val, long limit)
    {
        long q = val;
        long c = limit;
        long nc = -limit;

        int counter = 0;

        while (q > c || q < nc)
        {
            q >>= 1;
            counter++;
        }

        return counter;
    }

Here is a completely different idea...

Use the hardware's floating-point machinery, but augment it with your own integer exponents. Put another way, make BigFloat.significand be a floating-point number instead of an integer.

Then you can use ldexp and frexp to keep the actual exponent on the float equal to zero. These should be single machine instructions.

So BigFloat multiply becomes:

r.significand = left.significand * right.significand
r.exponent = left.exponent + right.exponent
tmp = (actual exponent of r.significand from frexp)
r.exponent += tmp
(use ldexp to subtract tmp from actual exponent of r.significand)

Unfortunately,the last two steps require frexp and ldexp, which searches suggest are not available in C#. So you might have to write this bit in C.

...

Or, actually...

Use floating-point numbers for the significands, but just keep them normalized between 1 and 2. So again, use floats for the significands, and multiply like this:

r.significand = left.significand * right.significand;
r.exponent = left.exponent + right.exponent;
if (r.significand >= 2) {
    r.significand /= 2;
    r.exponent += 1;
}
assert (r.significand >= 1 && r.significand < 2);  // for debugging...

This should work as long as you maintain the invariant mentioned in the assert(). (Because if x is between 1 and 2 and y is between 1 and 2 then x*y is between 1 and 4, so the normalization step is just has to check for when the significand product is between 2 and 4.)

You will also need to normalize the results of additions etc., but I suspect you are already doing that.

Although you will need to special-case zero after all :-).

[edit, to flesh out the frexp version]

BigFloat BigFloat::normalize(BigFloat b)
{
    double temp = b.significand;
    double tempexp = b.exponent;
    double temp2, tempexp2;
    temp2 = frexp(temp, &tempexp2);
    // Need to test temp2 for infinity and NaN here
    tempexp += tempexp2;
    if (tempexp < MIN_EXP)
        // underflow!
    if (tempexp > MAX_EXP)
        // overflow!
    BigFloat r;
    r.exponent = tempexp;
    r.significand = temp2;
}

In other words, I would suggest factoring this out as a "normalize" routine, since presumably you want to use it following additions, subtractions, multiplications, and divisions.

And then there are all the corner cases to worry about...

You probably want to handle underflow by returning zero. Overflow depends on your tastes; should either be an error or +-infinity. Finally, if the result of frexp() is infinity or NaN, the value of tempexp2 is undefined, so you might want to check those cases, too.

I am not much of a C# programmer, but here are some general ideas.

First, are there any profiling tools for C#? If so, start with those...

The time is very likely being spent in your qshift() function; in particular, the loop. Mispredicted branches are nasty.

I would rewrite it as:

long q = abs(val);
int x = q/nc;
(find next power of 2 bigger than x)

For that last step, see this question and answer.

Then instead of shifting by qshift, just divide by this power of 2. (Does C# have "find first set" (aka. ffs)? If so, you can use it to get the shift count from the power of 2; it should be one instruction.)

Definitely inline this sequence if the compiler will not do it for you.

Also, I would ditch the special cases for zero, unless you are multiplying by zero a lot. Linear code good; conditionals bad.

If you're sure there won't be an overflow, you can use an unchecked block.

That will remove the overflow checks, and give you a bit more performance.

继续阅读：floating-point performance

How to make my extended range floating point multiply more efficient?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？