ARM gcc inline assembler optimization problem
Why is it that my inline assembler routine is not working when I have optimization flag -O3 but it works with other optimization flags (-O0, -O1, -O2, -Os)?
I even added volatile to all my assembler instructions, which I thought would tell the compiler to not touch or reorder anything?
B开发者_Go百科est Regards
Mr Gigu
GCC inline assembler is very sensitive towards correct specification.
In particular, you have to be extremely precise about specifying the correct constraints to make sure the compiler does not decide to "optimize" your assembler code. There's a few things to watch out for. Take an example.
The following two:
int myasmfunc(int arg) /* definitely buggy ... */ { register int myval asm("r2") = arg; asm ("add r1, r0, #22\n" ::: "r1"); asm ("adds r0, r1, r0\n" ::: "r0", "cc"); asm ("subeq r2, #123\n" ::: "r2"); asm ("subne r2, #213\n" ::: "r2"); return myval; }
and
int myasmfunc(int arg) { int myval = arg, plus = arg; asm ("add %0, #22\n\t" : "+r"(plus)); asm ("adds %1, %2\n\t" "subeq %0, #123\n\t" "subne %0, #213\n\t" : "+r"(myval), "+r"(plus) : "r"(arg) : "cc"); return myval; }
might look similar at first sight and you'd naively assume they do the same; but they are very far from that !
There are multiple problems with the first version of this code.
- For one, if you specify it as separate
asm()
statements, the compiler is free to insert arbitrary code in-between. That in particular means thesub
instructions, even though they themselves don't modify the condition codes, can fall foul of things the compiler choose to insert which did. - Second, again due to the split of the instructions when specifying separate
asm()
statements, there's no guarantee the code generator will choose the same register to putmyval
in both times, theasm("r2")
spec in the variable declaration notwithstanding. - Third, the assumption made in the first that
r0
contains the argument of the function is wrong; the compiler, by the time it gets to the assembly block, might've choosen to move this argument to whatever other place. Worse even since again you have the split statement, and no guarantee is made as to what happens between twoasm()
. Even if you specify__asm__ __volatile__(...);
the compiler treats two such blocks as independent entities. - Fourth, you're not telling the compiler that you're clobbering / assigning
myval
. It might've chosen to temporarily move it elsewhere because you're clobbering "r2" and when returning, decide to restore it from ... (???).
Just for the fun of it, here's the output of the first function, for the following four cases:
- default -
gcc -c tst.c
- optimized -
gcc -O8 -c tst.c
- using some unusual options -
gcc -c -finstrument-functions tst.c
- that plus optimization -
gcc -c -O8 -finstrument-functions tst.c
Disassembly of section .text: 00000000 : 0: e52db004 push {fp} ; (str fp, [sp, #-4]!) 4: e28db000 add fp, sp, #0 ; 0x0 8: e24dd00c sub sp, sp, #12 ; 0xc c: e50b0008 str r0, [fp, #-8] 10: e51b2008 ldr r2, [fp, #-8] 14: e2811016 add r1, r1, #22 ; 0x16 18: e0910000 adds r0, r1, r0 1c: 0242207b subeq r2, r2, #123 ; 0x7b 20: 124220d5 subne r2, r2, #213 ; 0xd5 24: e1a03002 mov r3, r2 28: e1a00003 mov r0, r3 2c: e28bd000 add sp, fp, #0 ; 0x0 30: e8bd0800 pop {fp} 34: e12fff1e bx lr Disassembly of section .text: 00000000 : 0: e1a03000 mov r3, r0 4: e2811016 add r1, r1, #22 ; 0x16 8: e0910000 adds r0, r1, r0 c: 0242207b subeq r2, r2, #123 ; 0x7b 10: 124220d5 subne r2, r2, #213 ; 0xd5 14: e1a00003 mov r0, r3 18: e12fff1e bx lr Disassembly of section .text: 00000000 : 0: e92d4830 push {r4, r5, fp, lr} 4: e28db00c add fp, sp, #12 ; 0xc 8: e24dd008 sub sp, sp, #8 ; 0x8 c: e1a0500e mov r5, lr 10: e50b0010 str r0, [fp, #-16] 14: e59f0038 ldr r0, [pc, #56] ; 54 18: e1a01005 mov r1, r5 1c: ebfffffe bl 0 20: e51b2010 ldr r2, [fp, #-16] 24: e2811016 add r1, r1, #22 ; 0x16 28: e0910000 adds r0, r1, r0 2c: 0242207b subeq r2, r2, #123 ; 0x7b 30: 124220d5 subne r2, r2, #213 ; 0xd5 34: e1a04002 mov r4, r2 38: e59f0014 ldr r0, [pc, #20] ; 54 3c: e1a01005 mov r1, r5 40: ebfffffe bl 0 44: e1a03004 mov r3, r4 48: e1a00003 mov r0, r3 4c: e24bd00c sub sp, fp, #12 ; 0xc 50: e8bd8830 pop {r4, r5, fp, pc} 54: 00000000 .word 0x00000000 Disassembly of section .text: 00000000 : 0: e92d4070 push {r4, r5, r6, lr} 4: e1a0100e mov r1, lr 8: e1a05000 mov r5, r0 c: e59f0028 ldr r0, [pc, #40] ; 3c 10: e1a0400e mov r4, lr 14: ebfffffe bl 0 18: e2811016 add r1, r1, #22 ; 0x16 1c: e0910000 adds r0, r1, r0 20: 0242207b subeq r2, r2, #123 ; 0x7b 24: 124220d5 subne r2, r2, #213 ; 0xd5 28: e59f000c ldr r0, [pc, #12] ; 3c 2c: e1a01004 mov r1, r4 30: ebfffffe bl 0 34: e1a00005 mov r0, r5 38: e8bd8070 pop {r4, r5, r6, pc} 3c: 00000000 .word 0x00000000
As you can see, neither of these does what you'd be hoping to see; the second version of the code, though, on gcc -c -O8 ...
ends up as:
Disassembly of section .text: 00000000 : 0: e1a03000 mov r3, r0 4: e2833016 add r3, r3, #22 ; 0x16 8: e0933000 adds r3, r3, r0 c: 0240007b subeq r0, r0, #123 ; 0x7b 10: 124000d5 subne r0, r0, #213 ; 0xd5 14: e12fff1e bx lr
and that is, rather closely, what you've specified in your assembly and what you're expecting.
Morale: Be explicit and exact with your constraints, your operand assignments, and keep interdependent lines of assembly within the same asm()
block (make a multiline statement).
This really should be a comment, but I'm unable to post one for some reason :(
The compiler optimization shouldn't really mess with your assembly. So, as Igor said, in what way is this "not working"? Maybe your ASM branches into a function, which has been optimized by the compiler giving a different result to what your assembly code might depend on?
some source file or more info about the compiler might be useful
精彩评论