Subset of x86 without a %gs register: binary patching code that uses %gs instead of trapping to emulation?
For reasons too complicated to explain here, I have the need to run a x86 GCC-compiled Linux program on a platform that is a subset of x86. This platform does not have the %gs register, which means it has to be emulated, because GCC relies on the presence of the %gs register.
Currently I have a wrapper which catches the exceptions when th开发者_开发百科e program attempts to access the %gs register, and emulates it. But this is dog slow. Is there a way that I can patch the opcodes in the ELF ahead of time with equivalent instructions, so that the trap-and-emulate is avoided?
Have you tried compiling your code with the -mno-tls-direct-seg-refs
option? From my GCC man page (i686-apple-darwin10-gcc-4.2.1):
-mtls-direct-seg-refs
-mno-tls-direct-seg-refs
Controls whether TLS variables may be accessed with offsets from
the TLS segment register (%gs for 32-bit, %fs for 64-bit), or
whether the thread base pointer must be added. Whether or not this
is legal depends on the operating system, and whether it maps the
segment to cover the entire TLS area.
For systems that use GNU libc, the default is on.
(This is assuming Adam Rosenfields solution is not applicable. It, or a similar approach, is probably a better way to solve it.)
You haven't stated how you're emulating the %gs register, but it's probably going to be tough to patch every usage in general unless you have some special knowledge about the program, because otherwise you only have 2 bytes (in the worst, common case) you can modify with your patch. Of course, if you're using something like %es = %gs it should be relatively straight forward.
Assuming this can somehow be made to work in your case the strategy is to scan the executable sections of the ELF-file and patch any instruction that uses or modifies the GS register. That is at least the following instructions:
- Any instruction with the GS segment override prefix (
65
expect for branch instructions in which case the prefix indicates something else) push gs
(0F A8
)pop gs
(0F A9
)mov r/m16, gs
(8C /r
)mov gs, r/m16
(8E /r
)mov gs, r/m64
(REX.W 8E /r
) (If you support 64-bit mode)
And any others instructions that allow segment registers (I don't think that are that many more, but I'm not 100% sure).
This is all comming from Intel® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 2A and 2B: Instruction Set Reference, A-Z. Be aware that the instructions are sometimes prefixed with other prefixes, sometimes not, so you should probably use a library to do the instruction decoding rather than blindly searching for byte sequences.
Some of the above instructions should be relatively straight forward to turn into call my_patch
or similar, but you're probably going to have trouble finding something that fits in two bytes and works in general. int XX
(CD XX
) might be a good candidate if you can setup an interrupt vector, but I'm not sure it's gonna be faster than the method you're currently using. You will of course need to record which instruction was patched out and have the interrupt handler (or whatever) react differently depending on the return address (that your handler receives).
You might be able to setup a trampoline if you can find room within -128..127 bytes and use JMP rel8
(EB cb
) to jump to the trampoline (usually another JMP
, but this time with more room for the target address), which then handles the instruction emulation and jumps back to the instruction following the patched out %gs usage.
Lastly I'd recommend keeping the trap-and-emulate code running to catch any cases you might not have thought off (self-modifying or injected code for instance). This way you can also log any unhandled cases and add them to your solution.
精彩评论