Which arithmetic instruction set operation is the slowest and the fastest? Are there any ranking? Benchmarks?
Which arithmetic instruction set operation is the slowest and the fastest on IA-32, IA-6开发者_C百科4? Are there any ranking? Benchmarks?
Generally speaking these are the square-root and division instructions especially for the scalar floating point pipeline.
For IA-32 and IA-64 specifically you might want to look at the Intel(R) IA-64 and IA-32 Architectures Optimization Reference Manual which has cycle counts for each instruction on different processors in Appendix C. You'll see that the SIMD equivalent instructions perform much better at a cost of less precision and they operate on 4 elements at a time. If you need more precision for the square-root and reciprocal-square-root you'll have to manually do that with an extra Newton-Raphson step.
Ummm, ADD & SUB are very fast. Any of the "partial" floating point ops are going to be very slow (which is why they're "partial" and may have to be called multiple times to finish).
精彩评论