开发者

How to detect I/O in python source code(standard library way of I/O)

I'm build an optimizing compiler for a small subset of python code for my final year project. First thing I'm doing is testing whether 开发者_运维技巧a variable is involved in or leads to I/O. If I were to statically trace a function call down the proverbial rabbit hole, how exactly would I know that it involves I/O? Would there be a call to a built-in python function such as print, input, or built-in 'file' object function calls to read and write?

I don't have alot of time to do this project(only 6 months) so I'm completely ignoring people writing the I/O in C, wrapping it some sort of python object and calling it from python.

Is the byte code generated indicative of whether there's I/O? Or is it as unhelpful as the AST?

No biggie if it's undoable, I'll just my I/O subset for my project to print, input read and write. That or do liveness analysis.

Thanks.


It's not as simple as just looking at the bytecode because calls for things are just symbol lookups:

>>> def write_to_a_file(s):
    f = open('foo.txt', 'w')
    f.write(s)
    f.close()


>>> import dis
>>> dis.dis(write_to_a_file)
  2           0 LOAD_GLOBAL              0 (open)
              3 LOAD_CONST               1 ('foo.txt')
              6 LOAD_CONST               2 ('w')
              9 CALL_FUNCTION            2
             12 STORE_FAST               1 (f)

  3          15 LOAD_FAST                1 (f)
             18 LOAD_ATTR                1 (write)
             21 LOAD_FAST                0 (s)
             24 CALL_FUNCTION            1
             27 POP_TOP             

  4          28 LOAD_FAST                1 (f)
             31 LOAD_ATTR                2 (close)
             34 CALL_FUNCTION            0
             37 POP_TOP             
             38 LOAD_CONST               0 (None)
             41 RETURN_VALUE      

The bytecodes them selves are just loading things, calling things, and storing things. You'll actually have to look at the payload if you're operating at the bytecode level.

Check out the current list of Python bytecodes and you can see that there's really nothing there that distinguishes I/O calls.

Even if you were to inspect all LOAD_GLOBAL calls or LOAD_FAST calls and apply a whitelist, that wouldn't necessarily work because there are modules that provide I/O and the bytecode doesn't really help you there either:

>>> def uses_a_module_for_io(s):
    import shutil
    shutil.copy(s, 'foo.txt')


>>> dis.dis(uses_a_module_for_io)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (shutil)
              9 STORE_FAST               1 (shutil)

  3          12 LOAD_FAST                1 (shutil)
             15 LOAD_ATTR                1 (copy)
             18 LOAD_FAST                0 (s)
             21 LOAD_CONST               2 ('foo.txt')
             24 CALL_FUNCTION            2
             27 POP_TOP             
             28 LOAD_CONST               0 (None)
             31 RETURN_VALUE  

>>> def doesnt_use_shutil_really(s):
    shutil = object()
    shutil.copy = lambda x,y: None
    shutil.copy(s, 'foo.txt')


>>> dis.dis(doesnt_use_shutil_really)
  2           0 LOAD_GLOBAL              0 (object)
              3 CALL_FUNCTION            0
              6 STORE_FAST               1 (shutil)

  3           9 LOAD_CONST               1 (<code object <lambda> at 011D8AD0, file "<pyshell#29>", line 3>)
             12 MAKE_FUNCTION            0
             15 LOAD_FAST                1 (shutil)
             18 STORE_ATTR               1 (copy)

  4          21 LOAD_FAST                1 (shutil)
             24 LOAD_ATTR                1 (copy)
             27 LOAD_FAST                0 (s)
             30 LOAD_CONST               2 ('foo.txt')
             33 CALL_FUNCTION            2
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

Note that the LOAD_FAST for shutil can be something the user just makes up. In my case I just made it a generic object, but the user can have a different shutil on their path as well.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜