开发者

Algorithm for rewriting modified goto semantics

I've got an large bunch of legacy code in an old self-conceived scripting language that we compile/translate into javascript.

That language has a conditional jump, jumping to a label. Difference to common goto statement is, that no backward jumps are possible. There are no nested if statements nor loops in that language.

As goto does not exist in javascript, I'm looking for an algorithm that transforms goto mylabeland mylabel: into semantically equivalent structure.

I thought of using ifs but found it not trivial because of the arbitrary nesting of the goto labels.

Example:

if cond1 goto开发者_如何学Python a
do something1
if cond2 goto b
do something2
a:
do something3
if cond3 goto c
do something4
c:
do something5
b:

Could be rewritten as:

lbl_b=false;
lbl_c=false;

lbl_a = cond1;
if (!cond1) {
  do something1;
  lbl_b = cond2;
  if (!lbl_b) {
    do something2;
  }
}
if (!lbl_b) {
  do something3;
  lbl_c = cond3;
  if (!lbl_c) {
    do something4;
  }
  do something5;
}

However, I was not able to derive a general algorithm from that.


This is usually called Goto Removal, we had just once a student work where the task was to implement it for C. In general you have to work with loops (sadly we did not put that work online). But as you have the restriction that you can only jump forward it is relatively easy:

Parse once over all lines and collect all labels. Create for every label a flag "skip_to_label". Initialize at beginning all flags to false. When you meet the conditional goto for label X you now prepend every single line , up to the label line with "if not skip_to_label" and set the flag to true.

This should be already enough and work, but is of course not very optimal.

How you can optimize it: Instead of prepanding the if, just maintain a set of flags for every line, and instead of setting something to false, just add for the lines the corrosponding flag in the set.

Now you can make the if for a group that contains all lines, where the set does not change, and the condition are the boolean flags of the set.

Example with your given code:

set                    your code
empty                  if cond1 goto a 
skip_to_a,             do something1
skip_to_a,             if cond2 goto b
skip_to_a, skip_to_b   do something2
skip_to_a, skip_to_b   a:
skip_to_b              do something3
skip_to_b, skip_to_c   if cond3 goto c
skip_to_b, skip_to_c   do something4
skip_to_b, skip_to_c   c:
skip_to_b              do something5
skip_to_b              b:

Now you write in front of each line either the if(s) or you start at the top and make an if block as long as the set remains the same.

So when you start you get your first at empty, its a conditional goto so instead you set your flag

if cond1 goto skip_to_a=true;

now the set changes, and you introduce your block with the if of the set:

if (!skip_to_a) BEGIN
   do something1
   if cond2 skip_to_b=true;
   END

next change in set, so new if block:

if (!skip_to_a and !skip_to_b) BEGIN
   do something2
   END

and so on (I guess you get now the idea).

EDIT: As one can nicely see with the sets in the example it is in general not possible to model it with nested ifs, as e.g. the lines with skip_to_a and the ones with skip_to_b overlap, but neither contains the other complete.


You could do something like tracking the goto state in a while loop, but it wouldn't look too pretty:

var goto = null ;

do {
  if(goto == null && cond1) goto = 'a' ;
  if(goto == null) do_something(1) ;
  if(goto == null && cond2) goto = 'b' ;
  if(goto == null) do_something(2) ;
  if(goto == null || goto == 'a') goto = null;
  if(goto == null) do_something(3) ;
  if(goto == null && cond3) goto = 'c' ;
  if(goto == null) do_something(4) ;
  if(goto == null || goto == 'c') goto = null ;
  if(goto == null) do_something(5) ;
  if(goto == null || goto == 'b') goto = null ;
} while(goto != null) 


Compiling to another language is usually harder than necessary. A simpler method would be, not to compile to the other language, but to interpret the code in javascript. This way it would easily be possible to produce your goto statement with any kind of semantics you would like.

However if you do it like this, you would need to move all the parsing logic into your javascript code, which might be ugly to do. Another method would be to compile some easier interpretable format, i.e. bytecode, so that you can precompute everything you need from the parser, all label positions etc.


One alternative solution would be to make each label into a method containing the code from the start of that label to the beginning of the following label, followed by a call to the function generated for the following label.

The pro for this is that a goto can be replaced by a simple method call. The drawback is that, for long scripts or loops you may end up with rather large call stacks.

Using this method, a simple algorithm would be:

goto_label_count := 0 // Find out how many methods we need to generate
For each line:
    if line is goto
        goto_label_count := goto_label_count + 1

Write function head
Write "goto0();"

goto_count := 0 // Generate code
For each line:
    if line is goto
        if goto_count > 0
             write "goto"+goto_count+"();" // produce call to last goto found
        write function header for corresponding goto //("goto"+goto_count+"()")
        goto_count := goto_count + 1
    else
        translate normally
Generate end of code

This may end up causing additional problems. For example, what of scope for variables? But, at least it is an alternative approach which I hope should get your mind started along more tracks. ;)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜