How can I multiplex output to an OS file descriptor in Python?
The subprocess.Popen
mechanism uses an underlyin开发者_开发问答g file descriptor, instead of a file-like object, to write its stdout/stderr
. I need to capture both the stdout
and stderr
while still displaying them to the console.
How can I create a file descriptor that Popen can use that will allow me to do this?
Just a bit of context: subprocess
uses the raw file descriptors of the stdin, stdout, stderr objects you specify, because it passes them down to POSIX. If you use subprocess.PIPE
, then it will create a new pipe with os.pipe()
. Also, Popen.communicate
reads until the end of the stream, which may not be desirable if you want to pipe the data somewhere else.
Since you want to print the output to stdout, I assume it's text output. You will need to use encoding
, errors
or universal_newlines
in Popen
for subprocess
to treat the file as text (see docs).
import subprocess
p = subprocess.Popen(
'/usr/bin/whoami',
stdout=subprocess.PIPE, # Control stdout
universal_newlines=True # Files opened in text mode
)
# Pipe the data somewhere else too, e.g.: a log file
with open('subprocess.log', 'w') as logfile:
# p.poll() returns the return code when `p` exits
while p.poll() is None:
line = p.stdout.readline()
# one to our stdout (readline includes the \n)
print(line, end='')
# one to the logfile
logfile.write(line)
The same technique can be used for manipulating stderr, for example, by passing file=sys.stderr
to print
. Note that you can also pipe from your own stdin just by passing it directly:
subprocess.Popen('/usr/bin/whoami', stdin=sys.stdin, stdout=subprocess.PIPE, ...)
After all, the standard streams just wrap file descriptors. If reading until the end of the line is unsuitable for the type of output you are expecting, you can just read
a very short buffer.
Working simultaneously with stderr and stdout
If you need both stdout and stderr, you come to the problem that you can only read from one at a time.
One possibility would be to use os.set_blocking
to make the pipes non-blocking, so that any read
method returns immediately if there is no data. This allows you to alternate between the streams.
Another possibility, is to have two separate threads process stdout and stderr; but there is a simpler way to achieve this by means of the aysncio
module:
import asyncio
import sys
PROCESS_PATH = '/bin/mixed_output'
class MultiplexProtocol(asyncio.SubprocessProtocol):
def __init__(self, exit_future):
self.exit_future = exit_future
def pipe_data_received(self, fd, data):
if fd == sys.stdout.fileno():
print(data.decode('utf-8'), file=sys.stdout, end='')
elif fd == sys.stderr.fileno():
print(data.decode('utf-8'), file=sys.stderr, end='')
def process_exited(self):
self.exit_future.set_result(True)
async def launch_subprocess(loop):
# Future marking the end of the process
exit_future = asyncio.Future(loop=loop)
# Use asyncio's subprocess
create_subp = loop.subprocess_exec(
lambda: MultiplexProtocol(exit_future),
PROCESS_PATH,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
stdin=None
)
transport, protocol = await create_subp
await exit_future
# Close the pipes
transport.close()
loop = asyncio.get_event_loop()
loop.run_until_complete(launch_subprocess(loop))
This is much less CPU consuming than constantly looping in the host process to pipe data to other streams, as MultiplexProtocol.pipe_data_received
is called only when needed.
精彩评论