开发者

Fast Sin/Cos using a pre computed translation array

I have the following code doing Sin/Cos function using a pre-calculated memory table. in the following example the table has 1024*128 items covering all the Sin/Cos values from 0 to 2pi. I know I can use Sin/Cos symmetry and hold only 1/4 of the values but them I will have more 'ifs' when computing the value.

private const double PI2 = Math.PI * 2.0; 
private const int TABLE_SIZE = 1024 * 128;
private const double TABLE_SIZE_D = (double)TABLE_SIZE;
private const double FACTOR = TABLE_SIZE_D / PI2;

private static double[] _CosineDoubleTable;
private static double[] _SineDoubleTable;

Set the translation table

private static void InitializeTrigonometricTables(){
   _CosineDoubleTable = new double[TABLE_SIZE];
   _SineDoubleTable = new double[TABLE_SIZE];

   for (int i = 0; i < TABLE_SIZE; i++){
      double Angle = ((double)i / T开发者_如何学PythonABLE_SIZE_D) * PI2;
      _SineDoubleTable[i] = Math.Sin(Angle);
      _CosineDoubleTable[i] = Math.Cos(Angle);
   }
}

The Value is a double in radians.

Value %= PI2;  // In case that the angle is larger than 2pi
if (Value < 0) Value += PI2; // in case that the angle is negative
int index = (int)(Value * FACTOR); //from radians to index and casted in to an int
double sineValue = _SineDoubleTable[index]; // get the value from the table

I'm looking for a faster way to do this. The above 4 lines are ~25% of the whole process (executed billions of times).


You could try to use unsafe code to eliminate array bounds checking.
But even a unsafe, optimized version does not seem to come anywhere near Math.Sin.

Results based on 1'000'000'000 iterations with random values:

(1) 00:00:57.3382769  // original version
(2) 00:00:31.9445928  // optimized version
(3) 00:00:21.3566399  // Math.Sin

Code:

static double SinOriginal(double Value)
{
    Value %= PI2;
    if (Value < 0) Value += PI2;
    int index = (int)(Value * FACTOR);
    return _SineDoubleTable[index];
}

static unsafe double SinOptimized(double* SineDoubleTable, double Value)
{
    int index = (int)(Value * FACTOR) % TABLE_SIZE;
    return (index < 0) ? SineDoubleTable[index + TABLE_SIZE]
                       : SineDoubleTable[index];
}

Test program:

InitializeTrigonometricTables();
Random random = new Random();

SinOriginal(random.NextDouble());
var sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    SinOriginal(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(1) {0}  // original version", sw.Elapsed);

fixed (double* SineDoubleTable = _SineDoubleTable)
{
    SinOptimized(SineDoubleTable, random.NextDouble());
    sw = System.Diagnostics.Stopwatch.StartNew();
    for (long i = 0; i < 1000000000L; i++)
    {
        SinOptimized(SineDoubleTable, random.NextDouble());
    }
    sw.Stop();
    Console.WriteLine("(2) {0}  // optimized version", sw.Elapsed);
}

Math.Sin(random.NextDouble());
sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    Math.Sin(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(3) {0}  // Math.Sin", sw.Elapsed);


I'm assuming Taylor expansions are no use for you. So if you want to use a table: You only need one table half as big.

  1. cos(x) = sin(pi/2-x).
  2. sin(pi + x) = -sin(x)

You can make your's code non-branching. Convert first to int format.

int index = (int)(Value * FACTOR);
index %= TABLE_SIZE; // one instuction (mask)
index = (index >= 0) ? index :TABLE_SIZE-index; // one instruction isel
double sineValue = _SineDoubleTable[index];

Compare with Math.Sin anyway. Profile Profile Priofile. (Cache miss might slow down your code in real examples.)


If you have to compute it so many times,

  1. Use a processor-specific math library like the IKML or ACML and
    1. Compute the values in groups (vectors).
    2. When you need both, always compute the sin and cos of a value at the same time.
  2. Check your algorithm complexity and implementation design.
  3. Make sure you're using all the processor has to offer - x64 architecture, plus any vector instructions that would help.


This looks pretty good, except for the mod operation. Can you do without it?

If the values are near zero, you can use

while(Value > PI2) Value -= PI2;
while(Value < 0) Value += PI2;

Or it may be faster to cast the index to an integer (possibly out of range) first, then mod that as as integer. If the table size is going to be a multiple of 2, you can even use bit operations (if the compiler doesn't do this already).


No guarantee that it'll do a lot of good, but depending on your processor, integer math is often faster than floating point math. That being the case, I'd probably rearrange the first three lines to first calculate an integer, then reduce its range (if necessary). Of course, as BlueRaja pointed out, using C++ will almost certainly help as well. Using assembly language probably won't do much good though -- for a table lookup like this, a C++ compiler can usually produce pretty good code.

If possible, I'd also look very hard at your accuracy requirements -- without knowing what you're doing with the values, it's hard to say, but for a lot of purposes, your table size and the precision you're storing are far beyond what's necessary or even close to useful.

Finally, I'd note that it's worth at least looking into whether this whole strategy is worthwhile at all. At one time, there was no question that using tables to avoid complex calculations was a solid strategy. Processors have sped up a lot faster than memory though -- to the point that such a table lookup is often a net loss nowadays. In fact, nearly the only way the table stands a chance at all is if it's small enough to fit in the processor cache.


One thing you could try would be to use the fact that cos(x) = sin(x + pi/2). And make the sine table one quarter larger, so you can reuse it as the cosine table starting one quarter in. Not sure if C# allows you to get a pointer to the middle of the table, as C would. But even if not, the decreased cache usage might be worth more than the added time for the offset into the sine table.

That, is, expressed with C:

double* _CosineDoubleTable = &_SineDoubleTable[TABLESIZE / 4];


That's going to be pretty damn fast as it is.

If you really need to squeeze every imaginable drop of performance out of this code, you may want to consider writing this part of it (including the outer loop that loops billions of times) in a C++ dll (or even ASM). Make sure your compiler is set to allow the largest possible instruction set available to you.

[Edit] I missed how large the tables are - this could very well slow down your code significantly due to cache misses. Have you tried benchmarking it against Math.Cos(), or other methods of approximating trig functions (you can get very good approximations with a few simple multiplications using Taylor Series)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜