开发者

processing an image using CUDA implementation, python (pycuda) or C++?

I am in a project to process an image using CUDA. The project is simply an addition or subtraction of the image.

May I ask your pro开发者_运维知识库fessional opinion, which is best and what would be the advantages and disadvantages of those two?

I appreciate everyone's opinions and/or suggestions since this project is very important to me.


General answer: It doesn't matter. Use the language you're more comfortable with.

Keep in mind, however, that pycuda is only a wrapper around the CUDA C interface, so it may not always be up-to-date, also it adds another potential source of bugs, …

Python is great at rapid prototyping, so I'd personally go for Python. You can always switch to C++ later if you need to.


If the rest of your pipeline is in Python, and you're using Numpy already to speed things up, pyCUDA is a good complement to accelerate expensive operations. However, depending on the size of your images and your program flow, you might not get too much of a speedup using pyCUDA. There is latency involved in passing the data back and forth across the PCI bus that is only made up for with large data sizes.

In your case (addition and subtraction), there are built-in operations in pyCUDA that you can use to your advantage. However, in my experience, using pyCUDA for something non-trivial requires knowing a lot about how CUDA works in the first place. For someone starting from no CUDA knowledge, pyCUDA might be a steep learning curve.


Take a look at openCV, it contains a lot of image processing functions and all the helpers to load/save/display images and operate cameras.

It also now supports CUDA, some of the image processing functions have been reimplemented in CUDA and it gives you a good framework to do your own.


Alex's answer is right. The amount of time consumed in the wrapper is minimal. Note that PyCUDA has some nice metaprogramming constructs for generating kernels which might be useful.

If all you're doing is adding or subtracting elements of an image, you probably shouldn't use CUDA for this at all. The amount of time it takes to transfer back and forth across the PCI-E bus will dwarf the amount of savings you get from parallelism.

Any time you deal with CUDA, it's useful to think about the CGMA ratio (computation to global memory access ratio). Your addition/subtraction is only 1 float point operation for 2 memory accesses (1 read and 1 write). This ends up being very lousy from a CUDA perspective.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜