开发者

Why this loop is not optimised out?

I've got a very simple c program which copies all elements from array A to back to array A. For example,

double *A;
A = (dou开发者_如何学Cble*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
  A[i] = A[i];
}

I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?

Many thanks.


From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.

For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.


my gcc (version 4.6.1) optimizes it out

$ cat 7680489.c
#include <stdlib.h>

#define SIZE 100

int main(void) {
  double *a;
  size_t i;

  a = calloc(SIZE, sizeof *a); /* initialize elements */
  for (i = 0; i < SIZE; i++) a[i] = a[i];
  free(a);

  return 0;
}
$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s
        .file   "7680489.c"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB3:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $8, %esi
        movl    $100, %edi
        call    calloc
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE3:
        .size   main, .-main
        .ident  "GCC: (Debian 4.6.1-4) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

No loop that I can see. The assembly output is very similar when using malloc rather than calloc. I switched to calloc to avoid having objects with indeterminate values about (thanks R..).


To optimize away the loop the compiler had to recognize several things:

  • The load/store does not cause a modification of the data (due, eg, to floating point NaN conversions)
  • The two array addresses are identical
  • The two array indexing expressions are identical
  • After taking into account the addresses and indexing, the load & store are not overlapping but coincide exactly.
  • The store will not "step" on the results of other stores, or on a value that has not yet been loaded.
  • The load/store cannot result in a storage fault. This in turn requires that the compiler recognize that the storage came from malloc, and that the loop does not index beyond the end of the allocation.
  • The loop will terminate in a finite number of iterations
  • And probably several others I'm not thinking of

Keep in mind that optimizations are oriented towards removing "normal" redundancies, not eliminating "classroom examples".


The compiler does not do any actual thinking.
It can only optimize out stuff that matches a preconceived pattern.

I.e. if the code does not match a known no-op pattern that has been pre-programmed into the compiler it does not get eliminated.

By putting in the A[i] = A[i] you changed the pattern just enough to not match the empty loop pattern


The problem here is that you are using pointers.

It is hard for a compiler to optimise pointers since it can assume that pointer can read/write to any location in memory.

Change to [] array operator instead and try again. You should see the optimization you expect


As a general big of info, two of the biggest things that compiler have trouble optimizing are loops and pointers, and your example deals with both. Compilers know that values change frequently in loops, and so they are very conservative when optimizing those. Also, A is a pointer, and compilers are aware that pointers can change due to diverse factors, and so they back off when changing those. That's why the compiler has trouble with your example.


This code has undefined behavior (using objects with indeterminate value) so why would you have any expectation for what it does?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜