开发者

What kind of data processing problems would CUDA help with?

I've worked on many data matching problems and very often they boil down to quickly and in parallel running many implementations o开发者_StackOverflow社区f CPU intensive algorithms such as Hamming / Edit distance. Is this the kind of thing that CUDA would be useful for?

What kinds of data processing problems have you solved with it? Is there really an uplift over the standard quad-core intel desktop?

Chris


I think you've answered your own question. In general, CUDA/OpenCL accelerates massively parallel operations. We've used CUDA to perform various DSP operations (FFT, FIR) and seen order-of-magnitude speedups. Order of magnitude speedups with a couple hundred dollars is a steal. While specialized CPU libraries like MKL and OpenMP have given us quite a speed increase, CUDA/OpenCL is much faster.

Check here for examples of a CUDA usage


For one, in SIGGRAPH '09 they showed a CUDA implementation of Vray for Maya. Real-time ray-tracing and preview quality at 20-fps with a $200 card? I think it helps greatly.


yes, it is main domain of CUDA. It's efficiency is maximum if following conditions are true:

  1. Processing of element does not depend on results of processing of other.
  2. No branching. Or at least adjacent elements branch the same way.
  3. Elements are spread uniformly in memory.

Of course there are really few tasks that fall into this conditions. Depending on how far you move from them the efficiency will get lower. Sometimes you need to completely rewrite your algorithm to maximize usage.


CUDA has been used to vastly improve speeds in computer tomography, the FASTRA project for instance performs on par with supercomputers (not just quad-core desktops!) while being assembled out of consumer-grade hardware for a few thousand euros.

Other research topics I'm aware of are swarm optimization and real-time audio processing.

In general: the technique can be used in every domain where all data must be processed the same way since all cores will perform the same operation. If your problem boils down to this kind of operations you're good to go :). Too bad not everything falls into this category...


There are generally two types of parallelism: task parallelism and data parallelism. CPU's accelerate at the former and GPU's at the latter. The reason for this is that CPU's have sophisticated branch prediction, out-of-order execution hardware and many-stage pipelines that let them execute independent tasks in parallel (e.g. 4 independent tasks on a quad-core). GPU's, on the other hand, have stripped out most of the control logic and instead have lots of ALU's. Thus, for tasks with data-parallelism (simple e.g. matrix addition) the GPU can take advantage of its many ALU's to operate on this data in parallel. Something like Hamming distance would be great for a GPU since you're just counting the number of differences between two strings, where each character is different based only on the position, and independent of any other character in the same string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜