c++ practical computational complexity of <cmath> SQRT()

2023-03-24 04:18 问答作者：

What is the difference in CPU cycles (or, in essence, in 'speed') between

 x /= y;

and

 #include <cmath>
 x = sqrt(y);

EDIT: I know the operations a开发者_JAVA百科ren't equivalent, I'm just arbitrarily proposing x /= y as a benchmark for x = sqrt(y)

The answer to your question depends on your target platform. Assuming you are using most common x86 cpus, I can give you this link http://instlatx64.atw.hu/ This is a collection of measured instruction latency (How long will it take to CPU to get result after it has argument) and how they are pipelined for many x86 and x86_64 processors. If your target is not x86, you can try to measure cost yourself or consult with your CPU documentation.

Firstly you should get a disassembler of your operations (from compiler e.g. gcc: gcc file.c -O3 -S -o file.asm or via dissasembly of compiled binary, e.g. with help of debugger). Remember, that In your operation there is loading and storing a value, which must be counted additionally.

Here are two examples from friweb.hu:

For Core 2 Duo E6700 latency (L) of SQRT (both x87, SSE and SSE2 versions)

29 ticks for 32-bit float; 58 ticks for 64-bit double; 69 ticks for 80-bit long double;

of DIVIDE (of floating point numbers):

18 ticks for 32-bit; 32 ticks for 64-bit; 38 ticks for 80-bit

For newer processors, the cost is less and is almost the same for DIV and for SQRT, e.g. for Sandy Bridge Intel CPU:

Floating-point SQRT is

14 ticks for 32 bit; 21 ticks for 64 bit; 24 ticks for 80 bit

Floating-point DIVIDE is

14 ticks for 32 bit; 22 ticks for 64 bit; 24 ticks for 80 bit

SQRT even a tick faster for 32bit.

So: For older CPUs, sqrt is itself 30-50 % slower than fdiv; For newer CPU the cost is the same. For newer CPU, cost of both operations become lower that it was for older CPUs; For longer floating format you needs more time; e.g. for 64-bit you need 2x time than for 32bit; but 80-bit is cheapy compared with 64-bit.

Also, newer CPUs have vector operations (SSE, SSE2, AVX) of the same speed as scalar (x87). Vectors are of 2-4 same-typed data. If you can align your loop to work on several FP values with same operation, you will get more performance from CPU.

If the square root function isn't implemented in special hardware or software, most library functions would calculate it using Newton's method, which converges quadratically.

Newton's method is an iterative method: you make an initial guess, calculate a trial result, and use that for the next guess. You repeat until you think you have a result that's "close enough." It so happens that you can prove how many iterations you need with square root. Every time through the cycle you get another two digits of accuracy, so most implementations will converge to the precision limit of doubles in 8-9 cycles.

If you read this carefully, you'll see that the iterative Newton's method is doing two subtractions, one multiplication, and one division per iteration.

As a general rule of thumb: Both floating point division and square root are considered as slow operation (compared to fast ones like addition or multiplication). Square root can be expect to be approximately the same speed or somewhat slower (i.e. approx. 1x - 2x lower performance) compared to a division. E.g. on Pentium Pro

Division and square root have a latency of 18 to 36 and 29 to 69 cycles, respectively

To get more accurate answer, you need to dig into architecture manual for your platform or perform a benchmark.

Note: many modern platforms also offer inverse square root, which has the speed approximately the same as sqrt, but is often more useful (e.g. by having invsqrt you can compute both sqrt and div with one multiplication for each).

继续阅读：complexity-theory cpu-cycles cpu-time sqrt

c++ practical computational complexity of <cmath> SQRT()

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？