开发者

What stages would be involved in compiling Assembly language to Machine Code

I'm trying to write a compiler to take an assembler file which will output raw machine code instructions.

I've found lots of tutorials on how to write a compiler, but I'm wondering if all the stages are relevant to assembler mnemonics. For instance, is lexical analys开发者_如何学运维is necessary at all given the simplified stage-by-stage format of assembler, or will it still be necessary but in a simpler format?


A lexical analyzer is still required: you must have something that will break the text into individual tokens (words, numbers, punctuation, etc.). You still need a parser, too, although a much simplified one. There is a grammar, after all.


As I see it, the lexical analysis is all that is needed, where the need of a parser is lessened because of the flat structure of assembly.


At first I would check if there aren't invalid instructions/operands, then if all variables used are declared. Once you are sure that the file is a valid program delete the comments and replace variables and procedures with addresses (you have to assign addresses to labels "on the fly" during the translation because you can't know the address right now.). Last do the actual conversion in binary code.

If you assume that every instruction has its own line it will be much easier: if the current line is a label then replace all further references to that with current address, else delete all spaces, leaving one between the two "words" (instruction and operands). Now process the instruction is a joke. ;)


I'd say you could apply almost every stage of compilers to assemblers, of course what applies to you depends on what your going to do. If your making a 1-to-1 mapping you need syntactical analysis to check for errors and a lexer and/or parser to process the text for specifiers to the assembly, such as sectioning, memory protection on .data (or even macros!). There is also size 'optimization' that can be a applied by funneling immediate constants into the smallest size possible. Of course you can go all out and perform deep analysis to do instruction reordering and fusing. You'd might also want a static analysis stage to check for invalid(illegal) sequences(LOCK CMPXCHG EDX,EDX would be an example of syntactical correct but invalid assembly iirc)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜