How to compare __m128 types?
__m128 a;
__m128 b;
How to code a != b
?
what to use: _mm_cmpneq_ps
or _mm_cmpneq_ss
?
How to process the result ?
Can't find adequate docs.
You should probably use _mm_cmpneq_ps
. However the interpretation of comparisons is a little different with SIMD code than with scalar code. Do you want to test for any corresponding element not being equal ? Or all corresponding elements not being equal ?
To test the results of the 4 comparisons from _mm_cmpneq_ps
you can use _mm_movemask_epi8
.
Note that comparing floating point values for equality or inequality is usually a bad idea, except in very specific cases.
__m128i vcmp = (__m128i)_mm_cmpneq_ps(a, b); // compare a, b for inequality
uint16_t test = _mm_movemask_epi8(vcmp); // extract results of comparison
if (test == 0xffff)
// *all* elements not equal
else if (test != 0)
// *some* elements not equal
else
// no elements not equal, i.e. all elements equal
For documentation you want these two volumes from Intel:
Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2A: Instruction Set Reference, A-M
Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2B: Instruction Set Reference, N-Z
The answer to this question also depends on whether you want actual inequality where you'd use something along the lines of what @PaulR has shown:
bool fneq128_a (__m128 const& a, __m128 const& b)
{
// returns true if at least one element in a is not equal to
// the corresponding element in b
return _mm_movemask_ps(_mm_cmpeq_ps(a, b)) != 0xF;
}
or whether you want to use some epsilon to specify that elements are still considered to be "equal" if they do not differ more than the threshold:
bool fneq128_b (__m128 const& a, __m128 const& b, float epsilon = 1.e-8f)
{
// epsilon vector
auto eps = _mm_set1_ps(epsilon);
// absolute of difference of a and b
auto abd = _mm_andnot_ps(_mm_set1_ps(-0.0f), _mm_sub_ps(a, b));
// compare abd to eps
// returns true if one of the elements in abd is not less than
// epsilon
return _mm_movemask_ps(_mm_cmplt_ps(abd, eps)) != 0xF;
}
Example:
auto a = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
auto b = _mm_set_ps(0.0, 0.0, 0.0, 1.e-15);
std::cout << fneq128_a(a, b) << ' ' << fneq128_b(a, b) << "\n";
Prints:
1 0
Peter is right!!! Tests against values that are 0.0f can fail under the previous approach.
Please consider this MACRO.
#define ISEQUAL(A, B) _mm_testz_si128(_mm_xor_si128(_mm_castps_si128(A), _mm_castps_si128(B)),
_mm_xor_si128(_mm_castps_si128(A), _mm_castps_si128(B)))
This results in 2 instructions.
精彩评论