Translation of machinecode into LLVM IR (disassembly / reassembly of X86_64. X86. ARM into LLVM bitcode)
I would like to translate X86_64, x86, ARM executables into LLVM IR (disassem开发者_JAVA技巧bly).
What solution do you suggest ?
mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.
https://github.com/trailofbits/mcsema
Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.
As regards to RevGen tool mentioned by @bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.
I pull them out here.
shortcoming of dynamic translation:
S2E [16] and Revnic [14] present a method for dynamically translating x86 to LLVM using QEMU. Unlike our approach, these methods convert blocks of code to LLVM on the fly which limits the application of LLVM analyses to only one block at a time.
IR incomplete:
Revnic [14] and RevGen [15] recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.
no abstract stack or promoting information
Further, the translated code retains all the assumptions of the original bi- nary about the stack layout. They do not provide any methods for obtaining an abstract stack or promoting memory locations to symbols, which are essential for the application of several source-level analyses.
I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.
Just post some references on translating ARM binary to LLVM IR:
disarm - arm binary to llvm ir disassembler
https://code.google.com/p/disarm/
However, I have not tried it, thus not sure about its quality and stability. Anyone else may post additional information about this project?
There is new project, being in some early phases, The libbeauty
:
https://github.com/jcdutton/libbeauty
Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU
It only supports subset of x86_64
as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.
精彩评论