How to detect I/O in python source code(standard library way of I/O)
I'm build an optimizing compiler for a small subset of python code for my final year project. First thing I'm doing is testing whether 开发者_运维技巧a variable is involved in or leads to I/O. If I were to statically trace a function call down the proverbial rabbit hole, how exactly would I know that it involves I/O? Would there be a call to a built-in python function such as print, input, or built-in 'file' object function calls to read and write?
I don't have alot of time to do this project(only 6 months) so I'm completely ignoring people writing the I/O in C, wrapping it some sort of python object and calling it from python.
Is the byte code generated indicative of whether there's I/O? Or is it as unhelpful as the AST?
No biggie if it's undoable, I'll just my I/O subset for my project to print, input read and write. That or do liveness analysis.
Thanks.
It's not as simple as just looking at the bytecode because calls for things are just symbol lookups:
>>> def write_to_a_file(s):
f = open('foo.txt', 'w')
f.write(s)
f.close()
>>> import dis
>>> dis.dis(write_to_a_file)
2 0 LOAD_GLOBAL 0 (open)
3 LOAD_CONST 1 ('foo.txt')
6 LOAD_CONST 2 ('w')
9 CALL_FUNCTION 2
12 STORE_FAST 1 (f)
3 15 LOAD_FAST 1 (f)
18 LOAD_ATTR 1 (write)
21 LOAD_FAST 0 (s)
24 CALL_FUNCTION 1
27 POP_TOP
4 28 LOAD_FAST 1 (f)
31 LOAD_ATTR 2 (close)
34 CALL_FUNCTION 0
37 POP_TOP
38 LOAD_CONST 0 (None)
41 RETURN_VALUE
The bytecodes them selves are just loading things, calling things, and storing things. You'll actually have to look at the payload if you're operating at the bytecode level.
Check out the current list of Python bytecodes and you can see that there's really nothing there that distinguishes I/O calls.
Even if you were to inspect all LOAD_GLOBAL
calls or LOAD_FAST
calls and apply a whitelist, that wouldn't necessarily work because there are modules that provide I/O and the bytecode doesn't really help you there either:
>>> def uses_a_module_for_io(s):
import shutil
shutil.copy(s, 'foo.txt')
>>> dis.dis(uses_a_module_for_io)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 0 (None)
6 IMPORT_NAME 0 (shutil)
9 STORE_FAST 1 (shutil)
3 12 LOAD_FAST 1 (shutil)
15 LOAD_ATTR 1 (copy)
18 LOAD_FAST 0 (s)
21 LOAD_CONST 2 ('foo.txt')
24 CALL_FUNCTION 2
27 POP_TOP
28 LOAD_CONST 0 (None)
31 RETURN_VALUE
>>> def doesnt_use_shutil_really(s):
shutil = object()
shutil.copy = lambda x,y: None
shutil.copy(s, 'foo.txt')
>>> dis.dis(doesnt_use_shutil_really)
2 0 LOAD_GLOBAL 0 (object)
3 CALL_FUNCTION 0
6 STORE_FAST 1 (shutil)
3 9 LOAD_CONST 1 (<code object <lambda> at 011D8AD0, file "<pyshell#29>", line 3>)
12 MAKE_FUNCTION 0
15 LOAD_FAST 1 (shutil)
18 STORE_ATTR 1 (copy)
4 21 LOAD_FAST 1 (shutil)
24 LOAD_ATTR 1 (copy)
27 LOAD_FAST 0 (s)
30 LOAD_CONST 2 ('foo.txt')
33 CALL_FUNCTION 2
36 POP_TOP
37 LOAD_CONST 0 (None)
40 RETURN_VALUE
Note that the LOAD_FAST
for shutil
can be something the user just makes up. In my case I just made it a generic object, but the user can have a different shutil
on their path as well.
精彩评论