开发者

Reading from open file handle that is being written to in Python

I know this is a classic problem in stream processing, but I don't know how to handle it in Python. I have a file handle that is being written to by an active process. I want to consume content from that file handle on a line-by-line basis, but I don't want to deadlock waiting to read. I will keep reading until EOF or 60 seconds of looped reading, whichever comes first. Advice on how to do this would be appreciated. My pseudo code description of this problem is below.

proc = genprocess("command")
found_a = False
found_b = False
start = time.time()
while True:
    line = proc.readline()
    while line:
        if not found_a and grep(pattern_a, line):
            found_a = True
            print "Found A, now looking for B"
        elif not found_b and grep(pattern_b, line):
            found_b = True
            print "Found B, all done"
            break
    if time.time() - start > 60:
        break
    else:
        time.sleep(5)

proc.kill()

The problem is that this only reads one line on each interval. Instead I want the inside of the loop to iterat开发者_如何学Ce as many times as possible, but not to block waiting for new content to be written to the file. Once it has read as much as is available, it should sleep for 5 seconds to allow more content to accumulate.


If you're running on a Unix environment, you could use Python's select module to wait for data on the file handle. Also, you can use Python's fcntl module to change a file handle to non-blocking mode as described in this question.

For example, assuming your proc variable is a regular file handle that supports fileno():

file_num = proc.fileno()
old_flags = fcntl.fcntl(file_num, fcntl.F_GETFL)
fcntl.fcntl(file_num, fcntl.F_SETFL, old_flags | os.O_NONBLOCK)


The fcntl example linked to above is OK (except that it puts the process in a busy loop polling), however I ended up using "select" to achieve more-or-less the desired functionality.

    started = False
    while True:
        if (time.time() - start > wait_for) or started:
            break
        (rlist, wlist, xlist) = select([proc.stdout], [], [], wait_interval)
        if len(rlist) > 0:
            line = rlist[0].readline() # read one line (this blocks until '\n' is read)
        else: # nothing available to read from proc.stdout
            print ".",
            sys.stdout.flush()
            time.sleep(1)
            continue
        if re.search("daemon started", line):
            started = True

    if not started:
        proc.kill() # don't leave the process running if it didn't start properly

And if this is the sort of thing a user might CTRL-C, then putting the whole thing in a try/except block and looking for KeyboardInterrupt allows proc.kill() to be called instead of leaving the process running in the background.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜