D3D Performance comparison, shaders vs built in shading
I have a running 3D engine built in D3D (via SlimDX). To avoid interupting the rendering pipeline I have batched together many objects with the same material into bigger meshes (to reduce state switching). This have worked well and gives a good performance level for my need.
The issue I am running into is that I, during runtime, need to change material properties on some subsets of those larger batched meshes. I'm using the attribute buffer for that and it have been working reasonably well. I have been using a limited number of active attributes earlier (ca 5 per mesh), but now find the need to have much more variations on the materials (different shades of opacity/colorblends) and thus ending up with possibly hundred or more combinations. And since those changes happens during runtime I can't bundle them together before rendering starts. Sure I could re-construct meshes, but I rather not since it is slow and switching back and fourth between materials needs to be done at interactive speeds.
So my question is, what is the best route to take?
Should i implement a more robust attribute handling system that dynamically masks faces with available attribute IDs on demand and then resets them when done? I have heard that fragmentation in the attribute buffer generates added performance hit and I am also unsure about the performance hit of subsequent DrawSubset() calls with material switches in between (i.e when is too much and when should i optimize my attribute arrays?). Anyone with any experience on this?
My other idea is to use a parametrized pixel shader. I don't need any fancy effects, just the bare minimum (current is the built in flat-shader with color only and transparency on some objects), so shader model 1 is more than enough for my needs. The idea here is to use one all-purpose shader and instead of switching material between calls just alter some shader parameters. But I don't know if this is faster than switching materials and/or if programmable shaders are slower than the build in ones (given the same result).
I'm also curious about the difference in performance hit between switching mesh or drawing different subsets in one big mesh (given the same number of material switches for both cases).
I understand that these questions might differ some between GFX-cards and their respective performance/age but I'm just looking for general guidelines here on what to focus most effort on (i.e what type of state switches/CPU-interference that gives the biggest GPU-hit). Memory is a concern also, so any implementations that duplicates whole (or large parts) of meshes are not possible for me.
My focus is performance on older(5y)/less capable/integrated GFX cards and not necessarily top of the line gamer cards or work station 开发者_StackOverflow中文版cards (like Quadro). Which I guess could make or break the solution using shaders depending on how good the shader performance is on a particular board.
Any and all suggestions and feedback are greatly appreciated.
Many thanks in advance!
Altering shader parameters will be just as slow. Ideally you want to write a Shader 2 based shader that uploads a large section of the attribute buffer to the graphics card. You then have a per-vertex attribute field which can select the appropriate attribute buffer.
Your performance problems will come from the number of draw calls you use. The more draw calls you use the more performance suffers. Any change of shader constants or texture will require a new DIP call. What you want to do is minimise the number of shader constant modificatiosn AND the number of DIP calls.
It becomes quite an involved process mind.
For example if you are processing a skeletal model with 64 bones in it then you have 2 options. 1 you set the world matrix for each bone's mesh data and call DIP. OR you load as many of the bone matrices as you can in one go and use a value on the vertex to select which bone it will use and call DIP once. The second WILL be faster. You will probably find you can do some multi-bone based skinning quite easily this way as well.
This goes for anything that causes a shader constant change. Do note that, even though you may use the fixed function pipeline, most modern graphics hardware (ie anything since the Radeon 9700 released in 2002) will translate Fixed function to shader based so the same performance issues apply.
Basically the thing to avoid over anything is anything that causes you to make another DIP call. Obviously its impractical to avoid doing this for everything and certain changes are less costly. As a rough rule of thumb the things to avoid, in order of expense, are as follows: (Be warned its been a while since i tested this so you may want to do some testing and alternative reading on the subject)
1) Change Shader
2) Change Texture
3) Change Shader constant
4) Change Vertex Buffer
5) Change Index Buffer
1 is by far the most expensive.
4 and 5 are pretty cheap by comparison to the rest but a different vertex format will likely cause bigger problems as it may cause the shader to get changed.
Edit: I'm not entirely sure why changing constants hurts so much. I would have thought such changes would pipeline well. Maybe on moden hardware its not such a problem. On some early hardware the constants were compiled into the shader so changing constants resulted in a full shader change.
As with anything you are best off trying and see what happens.
An interesting solution for sorting all your calls can be to use a sort key where the top 8 bits give you a shader id. The next 10 bits for texture and so on. You then perform a normal numeric sort and you have an easy way of playing with different sort orders to see what gives best performance :)
Edit2: Its worth noting that anything that changes the pixel shader state is more costly than changing the vertex buffer state because it is deeper in the pipeline. Thus it takes longer for it to bubble through ...
精彩评论