memory layout hack
i have been following this course in youtube and it was talking about how some programmers can use there knowledge of how memory is laid to do clever things.. one of the examples in the lecture was something like that
#include <stdio.h>
void makeArray();
void printArray();
int main(){
makeArray();
printArray();
return 0;
}
void makeArray(){
int array[10];
int i;
for(i=0;i<10;i++)
array[i]=i;
}
开发者_运维知识库void printArray(){
int array[10];
int i;
for(i=0;i<10;i++)
printf("%d\n",array[i]);
}
the idea is as long as the two function has the same activation record size on the stack segment it will work and print numbers from 0 to 9 ... but actually it prints something like that
134520820
-1079626712
0
1
2
3
4
5
6
7
there are always those two values at the begging ... can any one explain that ??? iam using gcc in linux
the exact lecture url starting at 5:15
I'm sorry but there's absolutely nothing clever about that piece of code and people who use it are very foolish.
Addendum:
Or, sometimes, just sometimes, very clever. Having watched the video linked to in the question update, this wasn't some rogue code monkey breaking the rules. This guy understood what he was doing quite well.
It requires a deep understanding of the underlying code generated and can easily break (as mentioned and seen here) if your environment changes (like compilers, architectures and so on).
But, provided you have that knowledge, you can probably get away with it. It's not something I'd suggest to anyone other than a veteran but I can see it having its place in very limited situations and, to be honest I've no doubt occasinally been somewhat more ... pragmatic ... than I should have been in my own career :-)
Now back to your regular programming ...
It's non-portable between architectures, compilers, releases of compilers, and probably even optimisation levels within the same release of a compiler, as well as being undefined behaviour (reading uninitialised variables).
Your best bet if you want to understand it is to examine the assembler code output by the compiler.
But your best bet overall is to just forget about it and code to the standard.
For example, this transcript shows how gcc can have different behaviour at different optimisation levels:
pax> gcc -o qq qq.c ; ./qq
0
1
2
3
4
5
6
7
8
9
pax> gcc -O3 -o qq qq.c ; ./qq
1628373048
1629343944
1629097166
2280872
2281480
0
0
0
1629542238
1629542245
At gcc's high optimisation level (what I like to call its insane optimisation level), this is the makeArray
function. It's basically figured out that the array is not used and therefore optimised its initialisation out of existence.
_makeArray:
pushl %ebp ; stack frame setup
movl %esp, %ebp
; heavily optimised function
popl %ebp ; stack frame tear-down
ret ; and return
I'm actually slightly surprised that gcc even left the function stub in there at all.
Update: as Nicholas Knight points out in a comment, the function remains since it must be visible to the linker - making the function static results in gcc removing the stub as well.
If you check the assembler code at optimisation level 0 below, it gives a clue (it's not the actual reason - see below). Examine the following code and you'll see that the stack frame setup is different for the two functions despite the fact that they have exactly the same parameters passed in and the same local variables:
subl $48, %esp ; in makeArray
subl $56, %esp ; in printArray
This is because printArray allocates some extra space to store the address of the printf
format string and the address of the array element, four bytes each, which accounts for the eight bytes (two 32-bit values) difference.
That's the most likely explanation for your array in printArray()
being off by two values.
Here's the two functions at optimisation level 0 for your enjoyment :-)
_makeArray:
pushl %ebp ; stack fram setup
movl %esp, %ebp
subl $48, %esp
movl $0, -4(%ebp) ; i = 0
jmp L4 ; start loop
L5:
movl -4(%ebp), %edx
movl -4(%ebp), %eax
movl %eax, -44(%ebp,%edx,4) ; array[i] = i
addl $1, -4(%ebp) ; i++
L4:
cmpl $9, -4(%ebp) ; for all i up to and including 9
jle L5 ; continue loop
leave
ret
.section .rdata,"dr"
LC0:
.ascii "%d\12\0" ; format string for printf
.text
_printArray:
pushl %ebp ; stack frame setup
movl %esp, %ebp
subl $56, %esp
movl $0, -4(%ebp) ; i = 0
jmp L8 ; start loop
L9:
movl -4(%ebp), %eax ; get i
movl -44(%ebp,%eax,4), %eax ; get array[i]
movl %eax, 4(%esp) ; store array[i] for printf
movl $LC0, (%esp) ; store format string
call _printf ; make the call
addl $1, -4(%ebp) ; i++
L8:
cmpl $9, -4(%ebp) ; for all i up to and including 9
jle L9 ; continue loop
leave
ret
Update: As Roddy points out in a comment. that's not the cause of your specific problem since, in this case, the array is actually at the same position in memory (%ebp-44
with %ebp
being the same across the two calls). What I was trying to point out was that two functions with the same argument list and same local parameters did not necessarily end up with the same stack frame layout.
All it would take would be for printArray
to swap the location of its local variables (including any temporaries not explicitly created by the developer) around and you would have this problem.
Probably GCC generates code that does not push the arguments to the stack when calling a function, instead it allocates extra space in the stack. The arguments to your 'printf' function call, "%d\n" and array[i] take 8 bytes on the stack, the first argument is a pointer and the second is an integer. This explains why there are two integers that are not printed correctly.
Never, ever, ever, ever, ever, ever do anything like this. It will not work reliably. You will get odd bugs. It is far from portable.
Ways it can fail:
.1. The compiler adds extra, hidden code
DevStudio, in debug mode, adds calls to functions that check the stack to catch stack errors. These calls will overwrite what was on the stack, thus losing your data.
.2. Someone adds an Enter/Exit call
Some compilers allow the programmer to define functions to be called on function entry and function exit. Like (1) these use stack space and will overwrite what's already there, losing data.
.3. Interrupts
In main(), if you get an interrupt between the calls to makeArray and printArray, you will lose your data. The first thing that happens when processing an interrupt is to save the state of the cpu. This usually involves pushing the CPU registers and flags onto the stack, and yes, you guessed it, overwrite your data.
.4. Compilers are clever
As you've seen, the array in makeArray is at a different address to the one in printArray. The compiler has placed it's local variables in different positions on the stack. It uses a complex algorithm to decide where to put variable - on the stack, in a register, etc and it's really not worth trying to figure out how the compiler does it as the next version of the compiler might do it some other way.
To sum up, these kind of 'clever tricks' aren't tricks and are certainly not clever. You would not lose anything by declaring the array in main and passing a reference/pointer to it in the two functions. Stacks are for storing local variables and function return addresses. Once your data goes out of scope (i.e. the stack top shrinks past the data) then the data is effectively lost - anything can happen to it.
To illustrate this point more, your results would probably be different if you had different function names (I'm just guessing here, OK).
精彩评论