Common SIMD techniques
Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code.
For example (ARMv6
), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of the corresponding by开发者_如何学运维tes of Ra and Rb:
USUB8 Rd, Ra, Rb
SEL Rd, Rb, Ra
Links to tutorials / uncommon SIMD techniques are good too :) ARMv6 is the most interesting for me, but x86(SSE,...)/Neon(in ARMv7)/others are good too.
One of the best SIMD resources ever was the old AltiVec mailing list. Although PowerPC/AltiVec-specific I suspect that a lot of the material on this list would be of general interest to anyone working with other SIMD architectures. Sadly this list seems now to be defunct after being moved to a forum on power.org, but you may be able to find archived versions of it. (If not then let me know - I have pretty much all the posts from 2000 - 2007.)
There is also a lot of potentially useful info on AltiVec, SSE, SIMD vectorization and performance in general at http://developer.apple.com/hardwaredrivers/ve/index.html, a good deal of which may be transferable to other SIMD architectures.
Try AMD's SSEPlus project on sourceforge
精彩评论