开发者

How can I multiplex output to an OS file descriptor in Python?

The subprocess.Popen mechanism uses an underlyin开发者_开发问答g file descriptor, instead of a file-like object, to write its stdout/stderr. I need to capture both the stdout and stderr while still displaying them to the console.

How can I create a file descriptor that Popen can use that will allow me to do this?


Just a bit of context: subprocess uses the raw file descriptors of the stdin, stdout, stderr objects you specify, because it passes them down to POSIX. If you use subprocess.PIPE, then it will create a new pipe with os.pipe(). Also, Popen.communicate reads until the end of the stream, which may not be desirable if you want to pipe the data somewhere else.

Since you want to print the output to stdout, I assume it's text output. You will need to use encoding, errors or universal_newlines in Popen for subprocess to treat the file as text (see docs).

import subprocess

p = subprocess.Popen(
    '/usr/bin/whoami',
    stdout=subprocess.PIPE,  # Control stdout
    universal_newlines=True  # Files opened in text mode
)

# Pipe the data somewhere else too, e.g.: a log file
with open('subprocess.log', 'w') as logfile:
    # p.poll() returns the return code when `p` exits
    while p.poll() is None:
        line = p.stdout.readline()
        # one to our stdout (readline includes the \n)
        print(line, end='')
        # one to the logfile
        logfile.write(line)

The same technique can be used for manipulating stderr, for example, by passing file=sys.stderr to print. Note that you can also pipe from your own stdin just by passing it directly:

subprocess.Popen('/usr/bin/whoami', stdin=sys.stdin, stdout=subprocess.PIPE, ...)

After all, the standard streams just wrap file descriptors. If reading until the end of the line is unsuitable for the type of output you are expecting, you can just read a very short buffer.

Working simultaneously with stderr and stdout

If you need both stdout and stderr, you come to the problem that you can only read from one at a time.
One possibility would be to use os.set_blocking to make the pipes non-blocking, so that any read method returns immediately if there is no data. This allows you to alternate between the streams.
Another possibility, is to have two separate threads process stdout and stderr; but there is a simpler way to achieve this by means of the aysncio module:

import asyncio
import sys

PROCESS_PATH = '/bin/mixed_output'

class MultiplexProtocol(asyncio.SubprocessProtocol):
    def __init__(self, exit_future):
        self.exit_future = exit_future

    def pipe_data_received(self, fd, data):
        if fd == sys.stdout.fileno():
            print(data.decode('utf-8'), file=sys.stdout, end='')
        elif fd == sys.stderr.fileno():
            print(data.decode('utf-8'), file=sys.stderr, end='')

    def process_exited(self):
        self.exit_future.set_result(True)


async def launch_subprocess(loop):
    # Future marking the end of the process
    exit_future = asyncio.Future(loop=loop)
    # Use asyncio's subprocess
    create_subp = loop.subprocess_exec(
        lambda: MultiplexProtocol(exit_future),
        PROCESS_PATH,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
        stdin=None
    )
    transport, protocol = await create_subp
    await exit_future
    # Close the pipes
    transport.close()


loop = asyncio.get_event_loop()
loop.run_until_complete(launch_subprocess(loop))

This is much less CPU consuming than constantly looping in the host process to pipe data to other streams, as MultiplexProtocol.pipe_data_received is called only when needed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜