开发者

Horrible performance loss when using Opengl FBO

I have successfully implemented a simple 2-d game using lwjgl (opengl) where objects fade away as they get further away from the player. This fading was initially implemented by computing distance to origin of each object from the player and using this to scale the objects alpha/opacity.

However when using larger objects, this approach appears a bit too rough. My solution was to implement alpha/opacity scaling for every pixel in the object. Not only would this look better, but it would also move computation time from CPU to GPU.

I figured I could implement it using an FBO and a temporary texture.

By drawing to the FBO and masking it with a precomputed distance map (a texture) using a special blend mode, I intended to achieve the effect. The algorithm is like so:

0) Initialize opengl and setup FBO

1) Render background to standard buffer

2) Switch to custom FBO and clear it

3) Render objects (to FBO)

4) Mask FBO using distance-texture

5) Switch to standard buffer

6) Render FBO temporary texture (to standard buffer)

7) Render hud elements

A bit of extra info:

  • The temporary texture has the same size as the window (and thus standard buffer)
  • Step 4 uses a special blend mode to achieve the desired effect:

    GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );

  • My temporary texture is created with min/mag filters: GL11.GL_NEAREST
  • The data is allocated using: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
  • The texture is initialized using: GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, width, height, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
  • There are no GL errors in my code.

This does indeed achieve the desired results. However when I did a bit of performance testing I found that my FBO approach cripples performance. I tested by requesting 1000 successive renders and measuring the time. The results were as following:

In 512x512 resolution:

  • Normal: ~1.7s
  • FBO: ~2.5s
  • (FBO -step 6: ~1.7s)
  • (FBO -step 4: ~1.7s)

In 1680x1050 resolution:

  • Normal: ~1.7s
  • FBO: ~7s
  • (FBO -step 6: ~3.5s)
  • (FBO -step 4: ~6.0s)

As you can see, this scales really badly. To make it even worse, I'm intending to do a second pass of this type. The machine I tested on is supposed to be high end in terms of my target audience, so I can expect people to have far below 60 fps with this approach, which is hardly acceptable for a game this simple.

开发者_运维技巧What can I do to salvage my performance?


As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).

For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.

Also, I am very grateful for the all the help, suggestions and comments I received :-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜