GPGPU: Still Bleeding Edge? [closed]
Is GPGPU ready for production and prototyping use, or would you still consider it mostly a research/bleeding edge technology? I work in the computational biology field and it's starting to attract attention from the more computer science oriented people in the field, but most of the work seems to be porting well-known algorithms. The porting of the algorithm is itself the research project and the vast majority of people in the field don't know much about it.
I do some pretty computationally intensive projects on conventional multicores. I'm wondering how close GPGPU is to being usable enough for prototyping new algorithms, and for everyday production use. From reading Wikipedia, I get the impression that the programming model is strange (heavily SIMD) and somewhat limited (no recursion or virtual functions, though these limitations are slowly being removed; no languages higher level than C or a limited subset of C++), and that there are several competing, incompatible standards. I also get the impression that, unlike regular multicore, fine-grained parallelism is the only game in town. Basic library functions would need to be rewritten. Unlike with conventional multicore, you can't get huge speedups just by parallelizing the outer loop of your program and calling old-school serial library functions.
How severe are these limitations in practice? Is GPGPU ready for serious use now? If not, how long would you guess it will take?
Edit: One major point I'm trying to wrap my head around is, how much different is the programming model from a regular multicore CPU with lots and lots of really slow cores.
Edit # 2: I guess the way I'd summarize the answers I've been given is that GPGPU is practical enough for early adopters in niches that it's extremely well suited for, but still bleeding edge enough not to be considered a "standard" tool like multicore or distributed parallelism, even in those niches where performance is important.
There isn't any question that people can do useful, production, computations with GPUs.
Mostly the computations that do well here are those that have pretty close to embarrasing parallelism. Both CUDA and OpenCL will let you express these computations in an only moderately painful way. So if you can cast your computation that way, you can do well. I don't think this restriction will ever be seriously removed; if they could do that, then general CPUs could do it, too. At least I wouldn't hold my breath.
You should be able to tell if your present application is suitable mostly by looking at your existing code. Like most parallel programming languages, you won't know your real performance until you've coded a complete application. Unfortunately there's no substitute for experience.
I am a graduate student in CS who has worked a bit with GPGPU. I also know of at least one organization that is currently porting parts of their software to CUDA. Whether doing so is worth it really depends on how important performance is to you.
I think that using CUDA will add a lot of expense to your project. First, the field of GPUs is very fractured. Even among NVIDIA cards you have a pretty wide array of feature sets and some code that works on one GPU might not work on another. Second, the feature set of CUDA, as well as of the video cards, is changing very quickly. It is not unlikely that whatever you write this year will have to be rewritten in 2-3 years to take full advantage of the new graphics cards. Finally, as you point out, writing GPGPU programs is just very difficult, so much so that parallelizing an existing algorithm for GPGPU is typically a publishable research project.
You might want to look into CUDA libraries that are already out there, for example CUBLAS, that you might be able to use for your project and that could help insulate you from these issues.
CUDA is in use in production code in financial services now, and increasing all the time.
Not only is it "ready for serious use" now, you've practically missed the boat.
Kind of an indirect answer, but I work in the area of nonlinear mixed-effect modeling in pharmacometrics. I've heard second-hand information that CUDA has been tried. There's such a variety of algorithms in use, and new ones coming all the time, that some look more friendly to a SIMD model than others, particularly the ones based on Markov-Chain Monte Carlo. That is where I suspect the financial applications are.
The established modeling algorithms are such large chunks of code, in Fortran, and the innermost loops are such complicated objective functions, that it's hard to see how the translation could be done even if opportunities for SIMD speedup could be found. It is possible to parallelize outer loops, which is what we do.
Computational biology algorithms tend to be less regular in structure than many of the financial algorithms successfully ported to GPUs. This means that they require some redesign at the algorithmic level in order to benefit from the huge amount of parallelism found in GPUs. You want to have dense and square data structures, and architect your code around large "for" loops with few "if" statements.
This requires some thinking but this is possible and we're beginning to get interesting performance with a protein folding code parallelized with Ateji PX.
精彩评论