开发者

Is there any performance difference between greater than and greater than or equal?

On today's modern processors, is there any performance difference between greater than and greater than 开发者_开发百科or equal comparison for a branch condition? If I have a condition that could just as easily be either, is there any slight advantage to choosing > over >= or vice-versa? (This would be for a compiled language on Intel or AMD hardware)


There shouldn't be any noticeable difference between comparing different predicates, because of the way they're computed (beware I haven't read the x86 manuals in detail so it may work different):

Most instructions produce several flags as a byproduct, usually you have at least: carry (c), overflow (o), zero (z) and negative (n).

Using those predicates that are created by a x-y instruction (that creates the above 4 reliably) we can easily figure out all wanted comparisions trivially. For unsigned numbers:

x = y    z
x != y   !z
x < y    !c
x <= y   !c + z
x > y    c . !z
x >= y   c

So it hardly makes any difference. But then there are some differences, which mostly come down to the fact if we can use TEST (which is an AND instead of a full blown subtraction) or have to use CMP (that's the subtraction). TEST is more limited but faster (usually).

Also modern architectures (starting from c2d on intel side) can sometimes fuse two µops into one macro op - so called macro-op fusion which has some nice advantages. And the rules for that change from one architecture to the next and are a bit longer. For example branches that test the overflow, parity or sign flag only (JO, JNO, JP, JNP, JS, JNS) can fuse with TEST but not with CMP on c2d and nehalems (you bet I looked that one up - section 7.5).

So can we just say it's complicated and not worry about such things? That is except if you're writing an optimizer for a compiler, because really - independent of WHAT you write in your source code the compiler will do what it wants anyhow - and for good reason (ie if JGE were theoretically faster you'd have to write if (x < y) usually..). And if you really need one advice: Comparing against 0 is often faster.


I'm not quite sure how the underlying implementation is done in the ALU/FPU but there should only be one operation for all of them (on primitive types that is)

I really hope that this is only a question because you are curious and not that you're trying to optimize, this will never give you a big performance boost and most likely your code will contain far far worse performance issues.

You can event implement all relation operators using just one:

a < b is the base
a > b == b < a
a >= b == !(a < b)
a <= b == !(a > b)

This is of course not how it's implemented in the CPU, this is more trivia.


I seriously doubt there's a difference.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜