efficient evaluation of max(a,b) inside loop c.f. branch prediction?
What is an efficient way to calculate the maximum of 2 floats inside a for loop in C without using a logic statement which might stall the pipeline such as a > b ? a : b
?
I am working with huge 3D arr开发者_StackOverflow社区ays and have tons of loop iterations.
Check what your compiler outputs, it's probably "optimal" already. For instance,
float foo(float a, float b)
{
return (a>b?a:b);
}
Compiled with GCC 4.5, -O3
, generates this assembly on x86_64:
Disassembly of section .text:
0000000000000000 <foo>:
0: f3 0f 5f c1 maxss %xmm1,%xmm0
4: c3 retq
i.e. the compiler knows a lot about the instruction set you're targeting, and the semantics of your code. Let it do its job.
Well, I don't think this is faster than using branching but this seems to work:
#include <stdio.h>
#define FasI(f) (*((int *) &(f)))
#define FasUI(f) (*((unsigned int *) &(f)))
#define lt0(f) (FasUI(f) > 0x80000000U)
#define le0(f) (FasI(f) <= 0)
#define gt0(f) (FasI(f) > 0)
#define ge0(f) (FasUI(f) <= 0x80000000U)
int main()
{
float a=11.0,b=4.6;
float x=a-b,y=b-a;
printf("%f\n",lt0((y))*a+lt0((x))*b);
return 0;
}
The defines were taken from The Aggregate Magic Algorithms
精彩评论