As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,or expertise, but this question will likely solicit debate, a
I was looking into different sorting algorithms, and trying 开发者_Python百科to think how to port them to GPUs when I got this idea of sorting without actually sorting. This is how my kernel looks:
In the book Programming Massively Parallel Processors the number of gflops is used to compare the effi开发者_如何转开发ciency of different matrix multiplication kernels. How would I compute this for m
I\'m looking for an algorithm that tests whether 2 line segments are intersecting in a GPU-friendly way.The line segments are in 2D.While there are many algorithms discussed on the web for doing this,
Does anyone know related information about L2 cache in Fermi? I have heard that it开发者_如何学Go is as slow as global memory, and the use of L2 is just to enlarge the memory bandwidth. But I can\'t f
I\'m not sure if it\'s possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disas开发者_如何转开发semble an compiled OpenCL kernel.
i am working on a code which needs to be time efficient and thus using Cufftfor this purpose but when i try to compute fft of a very large data in parallel it is slower than cpufftw and the reason i f
I have some question about how GPUs perform synchronizations. As I know, when a warp encounters a barrier (assuming it is in OpenCL), and it knows that the other warps of the same group haven\'t been
I\'m doing some GPGPU stuff on a GLES2 platform that supports maximum RGBA8 render targets (iOS). I need to o开发者_开发知识库utput a vec2 in the range +/- 2.0 with as much precision as I can get, so
I have a quick question about the active warp开发者_如何学运维s in GPU (I would prefer to know it in Fermi).