Intel FFT performance
What processor will perform better, i5-2500K or i7-960, regarding certain FFT operations per second, for example: complex FFT in-place on 16k buffer length?
I am asking that because I would lik开发者_StackOverflow社区e to saturate all cores and all threads, and since i7 has 8 threads and i5 only 4, my main concern is if the SSE instructions are able to run in parallel on all 8 logical threads.
This test http://ixbtlabs.com/articles3/cpu/ci7-turbo-ht-p1.html?pages=ci7-turbo-ht-p1.html
shows that gain from turning on the HT on i7 was 0% for FFT. (Scientific applications table, line FFT). The FFT was from MATLAB ( based on a library called FFTW).
i7-960 has 4 cores and 8 threads is from HyperThreading (HT). As was shown by ixbt, HT will not help to compute more FFTs, so I recommend you to buy newer i5-2500 with same 4 cores, but greater freq, greater turbo boost (dynamic overclocking) and newer technology.
Also, this 'i5' is of next microarchitecture (SNB - Sandy Bridge) and it has an AVX (twice a lot FLOPS per GHz). If FFT can use it (use modern library and modern compiler), it should almost double FFT performance (if we will not consider the memory bw limits). Intel says, there is 1.8x grow from AVX in their newer MKL: http://software.intel.com/en-us/articles/intel-avx-optimization-in-intel-mkl-v103/
The AVX/NHM (an AVX-enabled over Nehalem NHM) speedup is 1.8x for radix-2 1D CFFTs with N=1024
So, the i5-2500 is 1.8x better per tick from AVX, it has a bit more GHz (both from spec and TurboBoost) and it supports faster memory (DDR3-1066 for NHM and DDR3-1333 for i5 SND).
I would say no, one of the things about the i7 having 8 threads is that during context switches (which will happen more often because of the logical cores) FPU state is NOT PRESERVED so that means once an FPU operation resumes it has to repopulate the FPU structures so that it can complete the operation. From what I can tell the i5-2500k will do this faster since the threads only contend per core instead of a higher contention rate to use the FPU (which there are only 4 of).
P.S : I could possibly be wrong since I'm not sure on the specifics of the 960 but this is what I've found from some of the work I've done in the past.
精彩评论