Alternatives to Python Popen.communicate() memory limitations?
I have the following chunk of Python code (running v2.7) that results in MemoryError
exceptions being thrown when I work with large (several GB) files:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
sys.stderr.write(myStderr)
In reading the documentation to Popen.communicate开发者_C百科()
, there appears to be some buffering going on:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?
What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout
?
I should note that I need to handle output and error streams.
I think I found a solution:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
sys.stdout.write(ln)
for ln in myProcess.stderr:
sys.stderr.write(ln)
This seems to get my memory usage down enough to get through the task.
Update
I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!
What I would probably do instead, if I needed to read the stdout for something that large, is send it to a file on creation of the process.
with open(my_large_output_path, 'w') as fo:
with open(my_large_error_path, 'w') as fe:
myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)
Edit: If you need to stream, you could try making a file-like object and passing it to stdout and stderr. (I haven't tried this, though.) You could then read (query) from the object as it's being written.
For those whose application hangs after a certain amount of time when using Popen, please look for my case below:
A Rule of thumb, if you're not going to use stderr and stdout streams then don't pass/init them in the parameters of Popen! because they will fill up and cause you a lot of problems.
If you need them for a certain amount of time and you need to keep the process running, then you can close those streams at any time.
try:
p = Popen(COMMAND, stdout=PIPE, stderr=PIPE)
# After using stdout and stderr
p.stdout.close()
p.stderr.close()
except Exception as e:
pass
精彩评论