How is a relative JMP (x86) implemented in an Assembler?

2022-12-29 19:53 问答作者：

While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction:

OPCODE   INSTRUCTION   SIZE
 EB cb     JMP rel8     2
 E9 cw     JMP rel16    4 (because of 0x66 16-bit prefix)
 E9 cd     JMP rel32    5
 ...

(from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147)

All are relative jumps, where the size of each encoding (operation + operand) is in the third column.

Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops.

Now my problem is with the following situation:

b: XXX
c: JMP a
e: XXX
   ...
   XXX
d: JMP b
a: XXX      (where XXX is any instruction, depending
             on the to-be assembled program)

The problem is that I want the smallest possible encoding for a JMP instruction (and no NOP filling).

I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a.

How do existing assemblers solve this problem, or how would you do this?

This is what I am thinking which solves the problem:

First encode all the instructions to opcodes between the JMP and it's target, if this region contains a variable-sized opcode, use the maximum size, e.g. 5 for a JMP. Then encode the relative JMP to it's target, by choosing the smallest possible encoding size (3, 4 or 5) and calculate the distance. If any variable-sized opcode is encoded, change all absolute operands before, and all relative instructions that skips over this encoded instruction: they are开发者_如何学Python re-encoded when their operand changes to choose the smallest possible size. This method is guaranteed to end, as variable-sized opcodes only may shrink (because it uses the maximum size of them).

I wonder, perhaps this is an over-engineered solution, that's why I ask this question.

In the first pass you will have a very good approximation to which jmp code to use using a pessimistic byte counting for all jump instructions.

On the second pass you can fill in the jumps with the pessimistic opcode chosen. Very few jumps could then be rewritten to use a byte or two less, only those that were very close to the 8/16 bit or 16/32 byte jump threshold originally. As the candidates are all jumps of many bytes, they are less likely to be in critical loop situations so you are likely to find that further passes offer little or no benefit over a two pass solution.

Here's one approach I've used that may seem inefficient but turns out not to be for most real-life code (pseudo-code):

IP := 0;
do
{
  done = true;
  while (IP < length)
  {
    if Instr[IP] is jump
      if backwards
      { Target known
          Encode short/long as needed }
      else
      {  Target unknown
          if (!marked as needing long encoding) // see below
            Encode short
          Record location for fixup }
    IP++;
  }
  foreach Fixup do
    if Jump > short
      Mark Jump location as requiring long encoding
      PC := FixupLocation; // restart at instruction that needs size change
      done = false; 
      break; // out of foreach fixup
    else
      encode jump
} while (!done);

继续阅读：encoding instruction-set x86

How is a relative JMP (x86) implemented in an Assembler?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？