开发者

Custom glBlendFunc a lot slower than native

I'm trying to do my own custom glBlendFunc through fragment shaders, however, my solution is a lot slower than the native glBlendFunc, even 开发者_如何学Cwhen they do the exact blending function.

I was wondering if anyone had any suggestion on how to do this in a more efficient way.

My solution works something like this:

void draw(fbo fbos[2], render_item item)
{
   // fbos[0] is the render target
   // fbos[1] is the previous render target used to read "background" to blend against in shader
   // Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.

   fbos[0]->attach(); // Attach fbo
   fbos[1]->bind(1); // Bind as texture 1

   render(item);

   glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
} 

fragment.glsl

vec4 blend_color(vec4 fore) 
{   
    vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
    return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);  
}


Your best bet for improving the performance of FBO-based blending is NV_texture_barrier. Despite the name, AMD has implemented it as well, so if you stick to Radeon HD-class cards, it should be available to you.

Basically, it allows you to ping-pong without heavyweight operations like FBO binding or texture attachment operations. The specification has a section towards the bottom that shows the general algorithm.

Another alternative is EXT_shader_image_load_store. This will require DX11/GL 4.x class hardware. OpenGL 4.2 recently promoted this to core with ARB_shader_image_load_store.

Even with this, as Darcy said, you're never going to beat regular blending. It uses special hardware structures that shaders can't access (since they happen after the shaders have run). You should only do programmatic blending if there is some effect that you absolutely cannot accomplish any other way.


It is a lot more efficient because blending operations are built directly into the GPU hardware, so you probably aren't going to be able to beat it for speed. Having said that,make sure you have depth-testing, back-face culling , hardware blending, and any other unneeded operations turned off. I can't say it will make a huge difference, but it may make some.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜