How can I compare the performance of log() and fp division in C++?
I’m using a log-based class in C++ to store very small floating-point values (as the values otherwise go beyond the scope of double
). As I’m performing a large number of multiplications, this has the added benefit of converting the multiplications to sums.
However, at a certain point in my algorithm, I need to divide a standard double
value by an integer
value and than do a *=
to a log-based value. I have overloaded the *=
operator for my log-based class and the right-hand side value is first converted to a log-based value by running log()
and than added to the left-hand side value.
Thus the operations actually performed are floating-point division, log()
and floating-point summation.
My question whether it would be faster to first convert the denominator to a log-based value, which would replace the floating-point division with floating-point subtraction, yielding the following chain of operations: twice log()
, floating-point subtraction, floating-point summation.
In the end, this boils down to whether floating-point division is faster or slower than log()
. I suspect that a com开发者_如何学Pythonmon answer would be that this is compiler and architecture dependent, so I’ll say that I use gcc 4.2 from Apple on darwin 10.3.0. Still, I hope to get an answer with a general remark on the speed of these two operators and/or an idea on how to measure the difference myself, as there might be more going on here, e.g. executing the constructors that do the type conversion etc.
Cheers!
Do you divide by the same integer multiple times? If so you can instead multiply by 1./yourInteger
, and only do the divide once. That would be faster than either if possible.
As to your actual question, it's not only compiler and architecture dependent, but also micro-architecture and data dependent.
On your particular platform (darwin/x86), for current hardware i5/i7: ~24 cycles for divide(1), ~35 cycles for log( )
(2). However, because divide only uses a single instruction dispatch slot, the hardware's reorder engine can do other useful computation while the divide is in flight; log( )
is implemented in software, by contrast, and so there is less opportunity for the processor to hoist other computations into the latency of the logarithm. This means that in practice, divide will often be a good bit faster.
1) From the Intel Optimization Manual
2) Measured by calling log( )
in a tight loop and using mach_absolute_time( )
to get wall time.
On the x86 architecture, logarithms take significantly longer than divisions: 85 cycles (throughput) for FYL2X compared to 40 cycles for FDIV. I would be surprised if other architectures are much different. Go with the the floating-point division.
The main problem with division is that although it is a single instruction on most modern CPUs it typically has a high latency
(31 cycles on PowerPC - not sure what is on x86). Some of this latency can be buried though if you have other non-dependent instructions which can be issued at the same time as the division. So the answer will depend somewhat on what kind of instruction mix and dependencies you have in the loop that contains your divide (not to mention which CPU you are using).
Having said that, my gut feeling is that divide will be faster than a log function on most architectures.
I'm pretty sure that doing a log computation via whatever algorithm is going to be rather more expensive than even FP division would be.
Of course the only way to be sure is to code it up and measure the performance of the code. From your description it sounds like it shouldn't be too difficult to implement both versions and try it side-by-side.
精彩评论