SSE: _mm_mul_ps won't multiply 10001 with 10001 correctly but works fine for 10000 with 10000
I have a very simple program to multiply four numbers. It works fine when each of them is 10000 but does not if I change them to 10001. The result is off by one.
I compiled the program with gcc -msse2 main_sse.c -o sse
on both AMD Opteron and Intel
Xeon and get the same result on both machines.
I would appreciate any help. Couldn't find anything online on this开发者_如何学C topic.
#include <stdlib.h>
#include <stdio.h>
#include <xmmintrin.h>
int main(){
float x[4], y[4], temp[4]; int i; __m128 X, Y, result;
for(i=0; i < 4; i++) { x[i] = 10000; y[i] = 10000; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
for(i=0; i < 4; i++) { x[i] = 10001; y[i] = 10001; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
}
You are running into the limits of precision of IEEE 32 bit floating point numbers.
There are only 23 bits of fractional mantissa plus an implied '1' at the beginning. So the largest integer that can be exactly represented is 224 = 16777216
You would need 27 bits to exactly represent the product of 10001*10001 = 100020001.
Once you go above 224, you only get the nearest even integer.
Once you go above 225, you only get the nearest multiple of 4.
And so on.
精彩评论