Rendering 1.2 GB of textures smoothly, how does a 1 GB GPU do this?
My goal is to see what would happen when using more texture data than what would fit in physical GPU memory. My first attempt was to load up to 40 DDS textures, resulting in a memory footprint way higher than there was GPU memory. However, my scene would still render at 200+ fps on a 9500 GT.
My conclusion: the GPU/OpenGL is being smart and only keeps certain parts of the mipmaps in memory. I thought that should not be possible on a standard config, but whatever.
Second attempt: disable mip mapping, such that the GPU will al开发者_如何学运维ways have to sample from the high res textures. Once again, I loaded about 40 DDS textures in memory. I verified the texture memory usage with gDEBugger: 1.2 GB. Still, my scene was rendering at 200+ fps.
The only thing I noticed was that when looking away with the camera and then centering it once again on the scene, a serious lag would occur. As if only then it would transfer textures from main memory to the GPU. (I have some basic frustum culling enabled)
My question: what is going on? How does this 1 GB GPU manage to sample from 1.2 GB of texture data at 200+ fps?
OpenGL can page complete textures in and out of texture memory in between draw-calls (not just in between frames). Only those needed for the current draw-call actually need to be resident in graphics memory, the others can just reside in system RAM. It likely only does this with a very small subset of your texture data. It's pretty much the same as any cache - how can you run algorithms on GBs of data when you only have MBs of cache on your CPU?
Also PCI-E busses have a very high throughput, so you don't really notice that the driver does the paging.
If you want to verify this, glAreTexturesResident
might or might-not help, depending on how well the driver is implemented.
Even if you were forcing texture thrashing in your test (discarding and uploading of some textures from system memory to GPU memory every frame), which I'm not sure you are, modern GPUs and PCI-E have such a huge bandwidth that some thrashing does impact performance that much. One of the 9500GT models is quoted to have a bandwidth of 25.6 GB/s, and 16x PCI-E slots (500 MB/s x 16 = 8 GB/s) are the norm.
As for the lag, I would assume the GPU + CPU throttle down their power usage when you aren't drawing visible textures, and when you suddenly overload them they need a brief instant to power up. In real life apps and games this 0%-100% sudden workload changes never happen, so a slight lag is totally understandable and expected, I guess.
精彩评论