Thrust::Sort very long compile time
I'm trying to compile a block of example code using Thrust in an attempt to help learn some CUDA.
I'm using Visual Studio 2010, and I've gotten other examples to compile. However, when I compile this example, it takes upwards of 10 minutes to compile. I've selectively commented out lines and figured out that its the Thrust::sort line that takes forever (with that one line commented out it takes about 5 seconds to compile).
I found a post somewhere that talked about how sort was slow to compile in Thrust and that was a decision that the Thrust development team made (its 3x faster at runtime, but takes longer to compile). But that post was in late 2008.
Any idea why this is taking so long?
Also, I'm compiling on a machine with the following specs, so its not a slow machine
i7-2600k @ 4开发者_高级运维.5 ghz
16 GB DDR3 @ 1833 mhz Raid 0 of 6 GB/s 1TB drivesAs requested, this is the build string that it looks like Visual Studio is invoking
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -G0 --keep-dir "Debug\" -maxrregcount=32 --machine 64 --compile -D_NEXUS_DEBUG -g -Xcompiler "/EHsc /nologo /Od /Zi /MTd " -o "Debug\kernel.obj" "C:\Users\Rob\Desktop\VS2010Test\VS2010Test\VS2010Test\kernel.cpp" -clean
Example
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
int main(void)
{
// generate 16M random numbers on the host
thrust::host_vector<int> h_vec(1 << 24);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer data to the device
thrust::device_vector<int> d_vec = h_vec;
// sort data on the device
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
return 0;
}
The compiler in CUDA 3.2 was not optimized for compiling long, complex programs like sort
using debugging mode (i.e nvcc -G0
). You will find that CUDA 4.0 is much faster in this case. Removing the -G0
option should decrease compilation time by a significant fraction as well.
精彩评论