开发者

C++/C -assembly level questions

  1. When a global variable is used inside a function(C/C++), whe开发者_运维问答ther it'll be taken directly from registers or from stack?

  2. Why bound loops(for loops) are considered to have more scope for optimization than nobound loops(while loop/do while)?

  3. Why returning a value is not as good as passing the value by reference?

If possible plz give assembly level descriptions.


1) It will be taken from an address allocated as part of the application load. ie A global variable is simply an address in the process's virtual address space. If that global has been used recently the compiler may be able to cache it in a register.

2) They don't.

3) Returning a value often requires a copy of the data. If the data is a simple type (such as int or float) then it can and will be returned via a register. If the object is too large to fit in a register then the compiler must allocate space for the object on the stack and then copy the data being returned into this allocated space. Passing the value as a reference is, usually, implemented by passing a pointer to the data. Therefore you return the value by modifying the data at that memory address directly. No copy takes place and hence its faster. Do note, though, that Return Value Optimisation (RVO) can mean that there is no win to passing the return value in as a reference. Equally,a s pointed out in the comments, C++0x's new move constructor can also provide the same bonuses as RVO.

No need to explain any of those using assembler examples, IMO.


1) Global variable is statically allocated by linker (it can be an offset from module's base though, not necessarily a fixed address). Still, a function would usually read a global var from a direct address, and a local var from offset + stack pointer and a class field from offset + object base pointer. The value of a global variable can be cached in a register for subsequent reads, unless its declared "volatile".

2) Its not really a matter of for/do/while choice, but how easy its to compute the number of iterations, so that compiler would be able to decide whether to unroll and/or vectorize and/or parallelize the loop. For example, here the compiler would know the number of iterations:

for( i=0; i<8; i++ ) { j = 1 << i; XXX }

and here it won't:

for( j=1; j<256; j<<=1 ) { XXX }

The for loops maybe just more frequently have a structure which is easier to understand for compiler.

3) If its a value of basic type (char/short/int etc), its slower to return it by reference (though sometimes compiler can optimize this). But for larger structures a reference/pointer can reduce the amount of work for compiler, and it really may be faster if compiler won't be able to avoid creating some temporary copies etc.

Update: Ok, here's a more specific example:

#include <stdio.h>

int main( void ) {

  int a,b, i,j,s1,s2;

  a = 123 + printf(""); // unknown in compile time
  s1 = 1; 
  // bit reverse loop v1, gets unrolled
  for( i=0; i<8; i++ ) { j = 1 << i; s1 += s1 + ((a&j)>0); }
  s1 -= 256;

  b = s1 + printf("");
  // bit reverse loop v2, not unrolled
  for( s2=1; s2<256; s2+=s2+(j>0) ) { j = b & s2; b -= j; }
  s2 -= 256;

  printf( "a=%02X s1=%02X s2=%02X\n", a, s1, s2 );
}

Asm listings for gcc/intelc are available here: http://nishi.dreamhosters.com/u/1.zip


First off you have not specified a target platform, arm, x86, 6502, zpu, etc.

1) When a global variable is used inside a function(C/C++), whether it'll be taken directly from registers or from stack?

You were not clear, so a global can be passed in by value, by reference or not passed in and used directly in the function.

passed by value depends on the code/compiler/target which you didnt specify. So the value or address to the global can go in a register or on the stack depending on the calling convention for that compiler/target. Items passed in by register sometimes have a placeholder on the stack in case the function needs more registers than are available. So passed by value the value the global contained is initially accessed either in a register or on the stack.

passed by reference is pretty much the same as passed by value, instead of the value the address to the global is passed in by register or on the stack depending on the compiler/target. Where this differs is that you can access the global directly from/to its memory location, but that is the nature of pass by reference.

used directly in the function then it depends on the code/compiler/target as to whether the global is accessed directly from its fixed memory location or if a register loads that memory location and the value is operated on from a register. The stack is not used in this case so the answer is either (non-stack) memory or register.

2) Why bound loops(for loops) are considered to have more scope for optimization than nobound loops(while loop/do while)?

Depends on the code, compiler, and target, I cannot think of a generic case where one is better than the other.

3) Why returning a value is not as good as passing the value by reference?

Very very subtle performance gain if anything. Depends heavily on the code, compiler, and target. There are cases where by reference is slightly faster and cases where by value is slightly faster. Comparing the two, the differences have to do with the number of times the address or data has to be copied to/from registers or the stack on its path. At best you may save a few mov or load/store instructions.


In the general case (being precise here is hard), globals are retrieved from memory but not from the stack (unless already cached in a register), loops can be optimized depending on the information that the compiler has on what the loop does (can it perform loop unrolling?) and in the third case it depends on the actual code. Since the first two have already been dealt with in other questions, I will focus on the third question.

There is a common optimization called (Named) Return Value Optimization (N)RVO that the compiler can perform to avoid unnecessary copies.

// RVO                  // NRVO             // cannot perform RVO
type foo() {            type bar() {        type baz() {
   value a;                type a;             type a,b; 
   // operate on a         // modify a         // pass a and b to other functions
   return type(a);         return a;           if ( random() > x ) return a;
}                       }                      else return b;
                                            }

In both foo and bar, the compiler is able to analyze the code and determine that the temporary type(a) in foo or the named local variable a in bar are the return value of the function, so it can construct those objects in place of the return value (according to the calling convention) and avoid copying it. Contrast that with baz where the compiler must create objects a and b before actually knowing which has to be returned. In this case the compiler cannot optimize anything, has to perform the operations and only at the end copy either a or b to the return value.

Whenever the compiler performs (N)RVO or if it is actually impossible to perform, changing the function signature to receive the object by reference will not provide a performance advantage and will make code at the place of call less readable for functions that create new objects.

This should be used as a general rule of thumb, but noting that as always, there are exceptions, and cases where one or the other might be slightly better performance wise. But for most cases, and unless measuring the performance proves otherwise, you should write the code as close to the design semantics as possible. If a function creates a new object, then return it by value, if a functions modifies an object, pass by reference.

Some special cases can be a function that creates vectors and is called in a tight loop, where having a single vector that is pass by reference, cleared in the function and then filled will reduce the number of memory allocations (clear() in a vector does not deallocate memory, so it does not need to reallocate it in the next iteration).

On the other end, when function calls are chained, and with the proper combination of return bay value and pass by value, you might avoid extra copies by not passing references in --a non-const reference requires a non-temporary object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜