Speeding up an IO bound OpenGL application
I've been working on a point cloud player lately that should be ideally able to vis开发者_运维知识库ualize terrain data points from a lidar capture and display them sequentially at around 30fps. I, however, seem to have come to a wall resulting from PCI-e IO.
What I need to do for every frame is load a large point cloud stored in memory, then calculate a color map due to height (I'm using something akin to matlab's jet map), then transfer the data to the GPU. This works fine on cloud captures with points < one million. However, at about 2 million points, this starts slowing down below 30 frames per second. I realize this is a lot of data (2 million frames per point * [3 floats per point + 3 floats per color point] * 4 bytes per float * 30 frames per second = around 1.34 gigabytes per second)
My rendering code looks something like this right now:
glPointSize(ptSize);
glEnableClientState(GL_VERTEX_ARRAY);
if(colorflag) {
glEnableClientState(GL_COLOR_ARRAY);
} else {
glDisableClientState(GL_COLOR_ARRAY);
glColor3f(1,1,1);
}
glBindBuffer(GL_ARRAY_BUFFER, vbobj[VERT_OBJ]);
glBufferData(GL_ARRAY_BUFFER, cloudSize, vertData, GL_STREAM_DRAW);
glVertexPointer(3, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vbobj[COLOR_OBJ]);
glBufferData(GL_ARRAY_BUFFER, cloudSize, colorData, GL_STREAM_DRAW);
glColorPointer(3, GL_FLOAT, 0, 0);
glDrawArrays(GL_POINTS, 0, numPoints);
glDisableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
The pointer for vertData and colorData are changed every frame.
What I would like to be able to do is be able to play at a minimum of 30 frames per second even when later using large point clouds that might reach up to 7 million points per frame. Is this even possible? Or perhaps it would be easier to grid them and construct a heightmap and somehow display that? I'm still pretty new to 3-D programming, so any advice would be appreciated.
If you can, implement color map using 1D texture. You'll only need 1 texture coordinate instead of 3 colors and it will make vertices 128-bit aligned too.
EDIT: You just need to create texture from your colormap and use glTexCoordPointer instead of glColorPointer (and changing vertex color values for texture coordinates in [0, 1] range of course). Here's linearly interpolated 6 texel colormap:
// Create texture
GLuint texture;
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_1D, texture);
glTexParameteri(GL_TEXTURE_1D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_1D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
// Load textureData
GLubyte colorData[] = {
0xff, 0x00, 0x00,
0xff, 0xff, 0x00,
0x00, 0xff, 0x00,
0x00, 0xff, 0xff,
0x00, 0x00, 0xff,
0xff, 0x00, 0xff
};
glTexImage1D(GL_TEXTURE_1D, 0, GL_RGB, 6, 0, GL_RGB, GL_UNSIGNED_BYTE, colorData);
glEnable(GL_TEXTURE_1D);
I know nothing about opengl, but won't data compression be a natural workaround here? Isn't there support for integer types or 16-bit floats? Also other color representations than 3 floats per point?
If you're willing to deal with the latency you can double-(or more!)buffer your VBOs, transferring geometry into one buffer while rendering from another:
while(true)
{
draw_vbo( cur_vbo_id );
generate_new_geometry();
load_vbo( nxt_vbo_id );
swap( cur_vbo_id, nxt_vbo_id );
}
EDIT: You also might try interleaving your vertexes instead of using one VBO per component.
You say it's I/O bound. That implies you've profiled it and seen it spending 50% or more of its time waiting for I/O.
If so, that's what you've got to concentrate on, not the graphics.
If not, then some of the other answers sound good to me. Regardless, profile, don't guess. This is the method I use.
Some pointers:
- store as much data as possible on the graphic card and load only what is really needed (pretty obvious)
- use lod levels in trees (kd- or octtrees) and calculate as much as possible up front
- compression on disc is useful too in order to overcome io bottlenecks
精彩评论