Is a shift instruction faster than an IMUL instruction?
Which one is faster -
val = val*10;
or
val = (va开发者_运维知识库l<<3) + (val<<2);
How many clock cycles does imul
take when compared to shift instruction?
This is the 21st century. Modern hardware and compilers know how to produce highly optimised code. Writing multiplication using shifts won't help performance but it will help you to produce code with bugs in.
You have demonstrated this yourself with code that multiplies by 12 rather than 10.
I'd say, just write val = val * 10;
or val *= 10;
, and let the compiler worry about such questions.
In this case they probably take the same amount of cycles, though your manual "optimization" needs one more register (which can slow down the surrounding code):
val = val * 10;
lea (%eax,%eax,4),%eax
add %eax,%eax
vs
val = (val<<3) + (val<<1);
lea (%eax,%eax,1),%edx
lea (%edx,%eax,8),%eax
The compiler knows how to do strength reduction, and probably much better than you. Also, when you port your code to other platform (say, ARM), the compiler knows how to do strenght reduction on that platform too (x86's LEA
provides different optimization opportunities than ARM's ADD
and RSB
).
Doing silly "optimizations" like this by hand in a high-level language will accomplish nothing but showing people you're out of touch with modern technology and programming practices.
If you were writing in assembly directly, it would make sense to worry about this, but you're not.
With that said, there are a few cases where the compiler won't be able to optimize something like this. Consider an array of possible multiplicative factors, each consisting of exactly 2 nonzero bits, with code like:
x *= a[i];
If profiling shows this to be a major bottleneck in your program, you might consider replacing that by:
x = (x<<s1[i]) + (x<<s2[i]);
as long as you plan to measure the results. However I suspect it's rare to find a situation where this would help, or where it would even be possible. It's only plausible on a CPU with a weak multiply unit compared to shifts and total instruction throughput.
精彩评论