Assembly language : try to understand a small function
for my work, I need to reverse what this portion of code (ARM9) is doing. Im a java developper & I really don't understand this portion of code related to a single function.
Of course I'm asking help because the original code is not more available. Anyone can help me to know what this code is doing with a smal algorithm in any high language? It would be nice. I have tried for many hours without results.
sub_FFFF7B38
PUSH {LR}
ADDS R2, R0, #0
LDRB R3, [R2]
CMP R3, #0
BEQ loc_FFFF7B52
SUBS R1, #1
BCC loc_FFFF7B52
loc_FFFF7B46:
ADDS R0, #1
LDRB R3, [R0]
CMP 开发者_JAVA技巧 R3, #0
BEQ loc_FFFF7B52
SUBS R1, #1
BCS loc_FFFF7B46
loc_FFFF7B52:
SUBS R0, R0, R2
POP {R1}
Except for the last two lines, it could be something like the following.
Please don't hit me if I am not 100% correct.
If
R0
is p0
or p
and
R1
is n
and
R2
is temporary value (edited; first I thought: i
or address of p0[i]
)
R3
is temporary value
.
sub_FFFF7B38
PUSH {LR} ; save return address
ADDS R2, R0, #0 ; move R0 to R2
LDRB R3, [R2] ; load *p0
CMP R3, #0 ; if *p0==0
BEQ loc_FFFF7B52 ; then jump to loc_FFFF7B52
SUBS R1, #1 ; decrement n
BCC loc_FFFF7B52 ; if there was a borrow (i.e. n was 0): jump to loc_FFFF7B52
loc_FFFF7B46:
ADDS R0, #1 ; increment p
LDRB R3, [R0] ; load *p
CMP R3, #0 ; if *p==0
BEQ loc_FFFF7B52 ; jump to loc_FFFF7B52
SUBS R1, #1 ; decrement n
BCS loc_FFFF7B46 ; if there was no borrow (i.e. n was not 0): jump to loc_FFFF7B46
loc_FFFF7B52:
SUBS R0, R0, R2 ; calculate p - p0
POP {R1} ; ??? I don't understand the purpose of this
; isn't there missing something?
or in C:
int f(char *p0, unsigned int n)
{
char *p;
if (*p0==0 || n--==0)
return 0;
for(p=p0; *++p && n>0; n--)
{
}
return p - p0;
}
Here are the instructions commented line by line
sub_FFFF7B38
PUSH {LR} ; save LR (link register) on the stack
ADDS R2, R0, #0 ; R2 = R0 + 0 and set flags (could just have been MOV?)
LDRB R3, [R2] ; Load R3 with a single byte from the address at R2
CMP R3, #0 ; Compare R3 against 0...
BEQ loc_FFFF7B52 ; ...branch to end if equal
SUBS R1, #1 ; R1 = R1 - 1 and set flags
BCC loc_FFFF7B52 ; branch to end if carry was clear which for subtraction is
; if the result is not positive
loc_FFFF7B46:
ADDS R0, #1 ; R0 = R0 + 1 and set flags
LDRB R3, [R0] ; Load R3 with byte from address at R0
CMP R3, #0 ; Compare R3 against 0...
BEQ loc_FFFF7B52 ; ...branch to end if equal
SUBS R1, #1 ; R1 = R1 - 1 and set flags
BCS loc_FFFF7B46 ; loop if carry set which for subtraction is
; if the result is positive
loc_FFFF7B52:
SUBS R0, R0, R2 ; R0 = R0 - R2
POP {R1} ; Load what the previously saved value of LR into R1
; Presumably the missing next line is MOV PC, R1 to
; return from the function.
So in very basic C code:
void unknown(const char* r0, int r1)
{
const char* r2 = r0;
char r3 = *r2;
if (r3 == '\0')
goto end;
if (--r1 <= 0)
goto end;
loop:
r3 = *++r0;
if (r3 == '\0')
goto end;
if (--r1 > 0)
goto loop;
end:
return r0 - r2;
}
Adding some control structures to get rid of the goto
s:
void unknown(const char* r0, int r1)
{
const char* r2 = r0;
char r3 = *r2;
if (r3 != '\0')
{
if (--r1 >= 0)
do
{
if (*++r0 == '\0')
break;
} while (--r1 >= 0);
}
return r0 - r2;
}
Edit: Now that my confusion about the carry bit and SUBS
has been cleared up this makes more sense.
Simplifying:
void unknown(const char* r0, int r1)
{
const char* r2 = r0;
while (*r0 != '\0' && --r1 >= 0)
r0++;
return r0 - r2;
}
In words, this is find the index of the first NUL
in the first r1
chars of the string pointer to by r0
, or return r1
if none.
Filip has provided some pointers, you also need to read up on the ARM calling convention. (That is to say, which register(s) contain the function arguments on entry and which its return value.)
From a quick reading I think this code is strnlen or something closely related to it.
How about this: Instruction set for ARM
Some hints / simplicifed asm
- Push - Puts something on the "Stack" / Memory
- Add - Usualy "add" as in +
- Pop retreives something from the "stack" / Memory
- CMP - is Short of Compare, which compares something with something else.
X:
or: Whatever:
means that the following is a "subroutine". Ever used "goto" in Java? Similar to that actually.
If you have the following ( ignore if it is correct arm-asm it's just pseduo ):
PUSH 1
x:
POP %eax
First it would put 1 on the stack and then pop it back into eax ( which is short for extended ax, which is a register where you can put 32-bit amount of data )
Now, what does the x:
do then? Well let's assume that there are 100 lines of asm before that aswell, then you could use a "jump"-instruction to navigate to x:
.
That's a little bit of introduction to asm. Simplified.
Try to understand the above code and examine the instruction-set.
My ASM is a bit rusty, so no rotten tomatoes please. Assuming this starts at sub_FFFF7B38
:
The command PUSH {LR}
preserves the link register, which is a special register which holds the return address during a subroutine call.
ADDS
sets the flags (like CMN
would). Also ADDS R2, R0, #0
adds R0
to 0 and stores in R2. (Correction from Charles in comments)
LDRB R3, [R2]
is loading the contents of R2
into main memory instead of a register, referenced by R3
. LDRB
only loads a single byte. The three unused bytes in the word are zeroed upon loading. Basically, getting R2
out of the registers and in safe keeping (maybe).
CMP R3, #0
performs a subtraction between the two operands and sets the register flags, but does not store a result. Those flags lead to...
BEQ loc_FFFF7B521
, which means "If the previous comparison was equal, go to loc_FFFF7B521" or if(R3 == 0) {goto loc_FFFF7B521;}
So if R3
isn't zero, then the SUBS R1, #1
command subtracts one from R1
and sets a flag.
BCC loc_FFFF7B52
will cause execution to jump to loc_FFFF7B52
if the carry flag is set.
( snip )
Finally, POP {LR}
restores the previous return address that was held on the link register before this code executed.
Edit - While I was in the car, Curd spelled out just about what I was thinking when I was trying to write out my answer and ran out of time.
精彩评论