CUDA & VS2010 problem
I have scoured the internets looking for an answer to this one, but couldn't find any. I've installed the CUDA 3.2 SDK (and, just now, CUDA 4.0 RC) and everything seems to work fine after long hours of fooling around with include directories, NSight, and all the rest. Well, except this one thing: it keeps highlighting the <<< >>>
operator as a mistake. Only on VS2010--not on VS2008.
On VS2010 I also get several warnings of the following sort:
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xdebug(109): warning C4251: 'std::_String_val<_Ty,_Alloc>::_Alval' : class 'std::_DebugHeapAllocator<_Ty>' needs to have dll-interface to be used by clients of class 'std::_String_val<_Ty,_Alloc>'
Update: If I try and include an entry point in a .cpp
file that calls a CUDA kernel, instead of writing main()
in a .cu
file as I was doing, the operator is actually flagged as an error, besides highlighting it! The same thing happens with VS2008.
Anyone know how this can be fixed?
Update 2: Here is the code. The main.cpp
file:
#include "kernel.cu"
int main()
{
doStuff();
return 0;
}
and the .cu
file:
#include <iostream>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cutil_in开发者_开发百科line.h>
#include <time.h>
using namespace std;
#define N 16
__global__ void MatAdd(float A[N][N], float B[N][N], float C[N][N])
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < N && j < N)
C[i][j] = A[i][j] + B[i][j];
}
int doStuff()
{
dim3 threadsPerBlock(8, 8);
dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);
float A[N][N], B[N][N], C[N][N];
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
{
A[i][j] = 0;
B[i][j] = 0;
C[i][j] = 0;
}
clock_t start = clock();
MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);
clock_t end = clock();
cout << "Took " << float(end - start) << "ms to work out." << endl;
cin.get();
return 0;
}
Update 3: Alright, I was (idiotically) including the CUDA code in the .cpp
file, so of course it couldn't compile. Now I have CUDA 4.0 up and running on VS2010, but I still get several warnings of the kind explained above.
You cannot do this...
#include "kernel.cu"
Now you're asking the Visual Studio CPP compiler to compile the .CU file as though it was a header. You need to have a header file that declares doStuff() and include the header not the definition.
The following might be helpful.
http://www.ademiller.com/blogs/tech/2010/12/using-cudathrust-with-the-parallel-patterns-library/
http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/
Typically I set this up as two projects. One project that compiles against the the 2008 CPP compiler for .CU and another that uses the 2010 compiler to get all the C++0x features.
The warnings your getting can be fixed by exporting the appropriate templates. Something like this but you'll have to write a specific one for each of the warning types.
#if defined(__CUDACC__)
#define DECLSPECIFIER __declspec(dllexport)
#define EXPIMP_TEMPLATE
#else
#define DECLSPECIFIER __declspec(dllimport)
#define EXPIMP_TEMPLATE extern
#endif
EXPIMP_TEMPLATE template class DECLSPECIFIER thrust::device_vector<unsigned long>;
See:
http://support.microsoft.com/default.aspx?scid=KB;EN-US;168958 and http://msdn.microsoft.com/en-us/library/esew7y1w.aspx
I've written a step-by-step guide to setting up VS 2010 and CUDA 4.0 here
http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/
BTW: A better way of timing CUDA code is with the event API.
cudaEvent_t start, stop;
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord( start, 0 );
kernel<<<grid,threads>>> ( d_odata, d_idata, size_x, size_y, NUM_REPS);
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
cudaEventElapsedTime( &time, start, stop );
cudaEventDestroy( start );
cudaEventDestroy( stop );
I was including the .cu
file directly. Of course, that's pretty much including the CUDA code in the .cpp
file, and hence the error!
精彩评论