Finding a string in a compiled executable
I have a very simple program as below: #include
int main(){
char* mystring = "ABCDEFGHIJKLMNO";
puts(mystring);
char otherstring[15];
otherstring[0] = 'a';
otherstring[1] = 'b';
otherstring[2] = 'c';
otherstring[3] = 'd';
otherstring[4] = 'e';
otherstring[5] = 'f';
otherstring[6] = 'g';
otherstring[7] = 'h';
otherstring[8] = 'i';
otherstring[9] = 'j';
otherstring[10] = 'k';
otherstring[11] = 'l';
otherstring[12] = 'm';
otherstring[13] = 'n';
otherstring[14] = 'o';
puts(otherstring);
return 0;
}
Compiler was MS VC++.
Whether I build this program with or without optimisations I can find the string "ABCDEFGHIJKLMNO" in the 开发者_如何学Cexecutable using a hex editor.
However, I cannot find the string "abcdefghijklmno"
What is the compiler doing that is different for otherstring?
The hex editor I used was Hexedit - but tried others and still couldn't find otherstring. Anyone any ideas why not or how to find?
By the way I am not doing this for hacking reasons.
This is what my gcc did with this code. I assume your compiler does a similar thing. The string constant is stored in the read only section and mystring is initialized with it's address.
The individual chars are placed directly into their array location on the stack. Also note that otherstring is not NULL terminated when you're calling puts with it.
.file "test.c"
.section .rodata
.LC0:
.string "ABCDEFGHIJKLMNO"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
/* here is where mystring is loaded with the address of "ABCDEFGHIJKLMNO" */
movq $.LC0, -40(%rbp)
/* this is the call to puts */
movq -40(%rbp), %rax
movq %rax, %rdi
call puts
/* here is where the bytes are loaded into otherstring on the stack */
movb $97, -32(%rbp) //'a'
movb $98, -31(%rbp) //'b'
movb $99, -30(%rbp) //'c'
movb $100, -29(%rbp) //'d'
movb $101, -28(%rbp) //'e'
movb $102, -27(%rbp) //'f'
movb $103, -26(%rbp) //'g'
movb $104, -25(%rbp) //'h'
movb $105, -24(%rbp) //'i'
movb $106, -23(%rbp) //'j'
movb $107, -22(%rbp) //'k'
movb $108, -21(%rbp) //'l'
movb $109, -20(%rbp) //'m'
movb $110, -19(%rbp) //'n'
movb $111, -18(%rbp) //'o'
The compiler is likely placing the number for each character into each array position, just as you wrote it, without any optimization that would be found from reading the code. Remember that a single character is no different than a number in c, so you could even use the ascii codes instead of 'a'. From a hexeditor I would expect you would see those converted back to letters, just spaced out a bit.
In the first case the compiler initializes data with the exact string "ABC...".
In the second case, each assignment is done sequentially, therefore compiler generates code to perform this assignment. In the executable you should see 15 repeating byte sequences where only the initializer ('a', 'b', 'c'...) changes.
精彩评论