Finding a string in a compiled executable

2023-02-21 22:41 问答作者：

I have a very simple program as below: #include

int main(){
char* mystring = "ABCDEFGHIJKLMNO";
puts(mystring);

char otherstring[15];
otherstring[0]  = 'a';
otherstring[1]  = 'b';
otherstring[2]  = 'c';
otherstring[3]  = 'd';
otherstring[4]  = 'e';
otherstring[5]  = 'f';
otherstring[6]  = 'g';
otherstring[7]  = 'h';
otherstring[8]  = 'i';
otherstring[9]  = 'j';
otherstring[10] = 'k';
otherstring[11] = 'l';
otherstring[12] = 'm';
otherstring[13] = 'n';
otherstring[14] = 'o';
puts(otherstring);

return 0;
}

Compiler was MS VC++.

Whether I build this program with or without optimisations I can find the string "ABCDEFGHIJKLMNO" in the 开发者_如何学Cexecutable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?

The hex editor I used was Hexedit - but tried others and still couldn't find otherstring. Anyone any ideas why not or how to find?

By the way I am not doing this for hacking reasons.

This is what my gcc did with this code. I assume your compiler does a similar thing. The string constant is stored in the read only section and mystring is initialized with it's address.
The individual chars are placed directly into their array location on the stack. Also note that otherstring is not NULL terminated when you're calling puts with it.

           .file   "test.c"
            .section        .rodata
    .LC0:
            .string "ABCDEFGHIJKLMNO"
            .text
    .globl main
            .type   main, @function
    main:
    .LFB0:
            .cfi_startproc
            pushq   %rbp
                .cfi_def_cfa_offset 16
            movq    %rsp, %rbp
            .cfi_offset 6, -16
            .cfi_def_cfa_register 6
            subq    $48, %rsp
            movq    %fs:40, %rax
            movq    %rax, -8(%rbp)
            xorl    %eax, %eax
    /* here is where mystring is loaded with the address of "ABCDEFGHIJKLMNO" */
            movq    $.LC0, -40(%rbp)
    /* this is the call to puts */
                movq    -40(%rbp), %rax
            movq    %rax, %rdi
            call    puts
    /* here is where the bytes are loaded into otherstring on the stack */            
            movb    $97, -32(%rbp)  //'a'
            movb    $98, -31(%rbp)  //'b'
            movb    $99, -30(%rbp)  //'c'
            movb    $100, -29(%rbp) //'d'
            movb    $101, -28(%rbp) //'e'
            movb    $102, -27(%rbp) //'f'
            movb    $103, -26(%rbp) //'g'
            movb    $104, -25(%rbp) //'h'
            movb    $105, -24(%rbp) //'i'
            movb    $106, -23(%rbp) //'j'
            movb    $107, -22(%rbp) //'k'
            movb    $108, -21(%rbp) //'l'
            movb    $109, -20(%rbp) //'m'
            movb    $110, -19(%rbp) //'n'
            movb    $111, -18(%rbp) //'o'

The compiler is likely placing the number for each character into each array position, just as you wrote it, without any optimization that would be found from reading the code. Remember that a single character is no different than a number in c, so you could even use the ascii codes instead of 'a'. From a hexeditor I would expect you would see those converted back to letters, just spaced out a bit.

In the first case the compiler initializes data with the exact string "ABC...".

In the second case, each assignment is done sequentially, therefore compiler generates code to perform this assignment. In the executable you should see 15 repeating byte sequences where only the initializer ('a', 'b', 'c'...) changes.

继续阅读：c string

Finding a string in a compiled executable

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？