
How can this function be optimized? (Uses almost all of the processing power)

I'm in the process of writing a little game to teach myself OpenGL rendering as it's one of the things I haven't tackled yet. I used SDL before and this same function, while still performing badly, didn't go as over the top as it does now.

Basically, there is not much going on in my game yet, just some basic movement and background drawing. When I switched to OpenGL, it appears as if it's way too fast. My frames per second exceed 2000 and this function uses up most of the processing power.

What is interesting is that the program in it's SDL version used 100% CPU but ran smoothly, while the OpenGL version uses only about 40% - 60% CPU but seems to tax my graphics card in such a way that my whole desktop becomes unresponsive. Bad.

It's not a too complex function, it renders a 1024x1024 background tile according to the player's X and Y coordinates to give the impression of movement while the player graphic itself stays locked in the center. Because it's a small tile for a bigger screen, I have to render it multiple times to stitch the tiles together for a full background. The two for loops in the code below iterate 12 times, combined, so I can see why this is ineffective when called 2000 times per second.

So to get to the point, this is the evil-doer:

void render_background(game_t *game)
    int bgw;
    int bgh;

    int x, y;

    glBindTexture(GL_TEXTURE_2D, game->art_background);
    glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_WIDTH,  &bgw);
    glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_HEIGHT, &bgh);


     * Start one background tile too early and end one too late
     * so the player can not outrun the background
    for (x = -bgw; x < root->w + bgw; x += bgw)
        for (y = -bgh; y < root->h + bgh; y += bgh)
            /* Offsets */
            int ox = x + (int)game->player->x % bgw;
            int oy = y + (int)game->player->y % bgh;

            /开发者_开发百科* Top Left */
            glTexCoord2f(0, 0);
            glVertex3f(ox, oy, 0);

            /* Top Right */
            glTexCoord2f(1, 0);
            glVertex3f(ox + bgw, oy, 0);

            /* Bottom Right */
            glTexCoord2f(1, 1);
            glVertex3f(ox + bgw, oy + bgh, 0);

            /* Bottom Left */
            glTexCoord2f(0, 1);
            glVertex3f(ox, oy + bgh, 0);


If I artificially limit the speed by called SDL_Delay(1) in the game loop, I cut the FPS down to ~660 ± 20, I get no "performance overkill". But I doubt that is the correct way to go on about this.

For the sake of completion, these are my general rendering and game loop functions:

void game_main()
    long current_ticks = 0;
    long elapsed_ticks;
    long last_ticks = SDL_GetTicks();

    game_t game;
    object_t player;

    if (init_game(&game) != 0)

    game.player = &player;

    /* game_init() */
    while (!game.quit)
        /* Update number of ticks since last loop */
        current_ticks = SDL_GetTicks();
        elapsed_ticks = current_ticks - last_ticks;

        last_ticks = current_ticks;

        game_handle_inputs(elapsed_ticks, &game);
        game_update(elapsed_ticks, &game);

        game_render(elapsed_ticks, &game);

        /* Lagging stops if I enable this */
        /* SDL_Delay(1); */



void game_render(long elapsed_ticks, game_t *game)
    game->tick_counter += elapsed_ticks;

    if (game->tick_counter >= 1000)
        game->fps = game->frame_counter;
        game->tick_counter = 0;
        game->frame_counter = 0;

        printf("FPS: %d\n", game->fps);




According to gprof profiling, even when I limit the execution with SDL_Delay(), it still spends about 50% of the time rendering my background.

Turn on VSYNC. That way you'll calculate graphics data exactly as fast as the display can present it to the user, and you won't waste CPU or GPU cycles calculating extra frames inbetween that will just be discarded because the monitor is still busy displaying a previous frame.

First of all, you don't need to render the tile x*y times - you can render it once for the entire area it should cover and use GL_REPEAT to have OpenGL cover the entire area with it. All you need to do is to compute the proper texture coordinates once, so that the tile doesn't get distorted (stretched). To make it appear to be moving, increase the texture coordinates by a small margin every frame.

Now down to limiting the speed. What you want to do is not to just plug a sleep() call in there, but measure the time it takes to render one complete frame:

function FrameCap (time_t desiredFrameTime, time_t actualFrameTime)
   time_t delay = 1000 / desiredFrameTime;
   if (desiredFrameTime > actualFrameTime)
      sleep (desiredFrameTime - actualFrameTime); // there is a small imprecision here

time_t startTime = (time_t) SDL_GetTicks ();
// render frame
FrameCap ((time_t) SDL_GetTicks () - startTime);

There are ways to make this more precise (e.g. by using the performance counter functions on Windows 7, or using microsecond resolution on Linux), but I think you get the general idea. This approach also has the advantage of being driver independent and - unlike coupling to V-Sync - allowing an arbitrary frame rate.

At 2000 FPS it only takes 0.5 ms to render the entire frame. If you want to get 60 FPS then each frame should take about 16 ms. To do this, first render your frame (about 0.5 ms), then use SDL_Delay() to use up the rest of the 16 ms.

Also, if you are interested in profiling your code (which isn't needed if you are getting 2000 FPS!) then you may want to use High Resolution Timers. That way you could tell exactly how long any block of code takes, not just how much time your program spends in it.





验证码 换一张
取 消

