开发者

Python tracing and conditional jumps

I'm writing a concolic engine for Python using the sys.settrace() functionality.

The main task during such kind of execution is to record the constraints on the input variables. The constraints are nothing else than the conditions of the if statements, that create two branches (the 'then' and the 'else' branch).

When an execution is complete, the engine chooses a constraint and finds appropriate values for the inputs so that the execution will go the down along the other branch (at execution x it goes the 'then' branch, at execution x+1 it goes along the 'else' branch).

This is to have a bit of context on why I doing what I'm trying to do...

By combining settrace() and the dis module, I get to see the bytecode of each source line, just before it is executed. This way I can easily record the if conditions as they appear during execution.

But then I have the big problem. I need to know which way the if went开发者_如何学运维, which branch the execution took. So if my code is something like:

if x > a:
  print x
else:
  print a

at a certain point my tracing thing will see:

t: if x > 0: 

then the python interpreter will execute the if and jump (or not) somewhere. And I will see:

t + 1: print x

So is the instruction t + 1 in the "then" branch or in the "else" one? Keep in mind the trace function sees only some bytecode in the current block.

I know of two way to do this. One is to evaluate the condition to see exactly whether it is true or false. This works only if there are no side-effects.

The other way is to try to look and the instruction pointer at t + 1 and try to understand where we are in the code. This is the way I am using right now, but it very delicate because at t + 1 I could find myself somewhere completely different (another module, a builtin function, etc).

So finally, the question I have is this: is there a way to get from Python itself, or from a C module/extension/whatever, the result of the last conditional jump?

In alternative, are there more fine-grained tracing options? Something like execute bytecode one opcode at a time. With the settrace() functionality the maximum resolution I get is whole source code lines.

In the worst case, I think I can modify the Python interpreter to expose such information, but I would leave that as last resort, for obvious reasons.


There is no information in the trace facility about the last branch taken.

What I did to implement branch coverage measurement in coverage.py is to keep a record for each stack frame of the last line executed, then the next time the trace function is called, I can record a pair of line numbers which form a from-to arc of execution.

About finer-grained tracing: you can trick the Python interpreter into giving you byte code information. My experiment in this is described here: Wicked hack: Python bytecode tracing

I'd be very interested to see how this work progresses!


In the end this is what I did. I implemented the AST instrumentation and it works pretty well.

By playing with the AST, you need to move all function calls (also attributes and subscriptions, due to getattr() and friends, out of the if conditions by creating temporary variables. Also you need to split the and and or operators.

Then add a call to your own function at the beginning of each branch, with a boolean parameter, True for the then branch and False for the else branch.

After that I wrote an AST to source converter (there is one somewhere on the net, but does not work on current Python versions).

Working with the AST is very easy and quite simple, I ended up doing three transformation passes, adding also some import statements.

This is the first pass, as an example. It splits if conditions if they contain or or and operators:

class SplitBoolOpPass1(ast.NodeTransformer):
  def visit_If(self, node):
      while isinstance(node.test, ast.BoolOp):
        new_node = ast.If(test=node.test.values.pop(), body=node.body, orelse=node.orelse)
        if isinstance(node.test.op, ast.And):
          if len(node.test.values) == 1:
            node.test = node.test.values[0]
          node.body = [new_node]
        else:
          if len(node.test.values) == 1:
            node.test = node.test.values[0]
          node.orelse = [new_node]
      node = self.generic_visit(node) # recusion
      return node

Probably it is not very useful for code coverage applications because it messes up with the code quite a lot.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜