开发者

Reading a file in Python while logging data in screen

Background

To capture data from a logic controller, I'm using screen as a terminal emulator and connecting my MacBook via the KeySpan USA-19HS USB Serial Adapter. I've created the following bash script, so that I can type talk2controller <filename> where filename is the name of the data file.

#!/bin/bash
if [ -z "$1" ]; then
    echo Please provide the filename to save the logfile
    exit
fi
LOGFILE=$1
echo "logfile $1" > screenrc        # Set the logfile filename
echo "logfile flush 1" >> screenrc  # Wait 1 sec before flushing buffer to filesystem
screen -L -c screenrc /dev/tty.KeySerial1 19200

I've changed the filename for the logfile and changed from the default of 10 seconds to 1 second for waiting before flushing the logfile buffer to the filesystem. I save those commands to screenrc. Then I call screen with:

  1. -L — logging enabled
  2. -c screenrc — override the default configuration file
  3. /dev/tty.KeySerial1 19200 — talk to the serial port using a baud rate of 19200

Each test that I log takes about 3–6 minutes and contains speed, acceleration, and position information. I开发者_如何学Go'll know that the test was valid based on the acceleration rate. Currently, I'm waiting until after the test to then run a Python matplotlib script to plot the speed, acceleration, and position to see if the test was valid before moving on to the next test.

To save time, I would prefer to plot the data about halfway through the test, while data is still being captured.

Questions

In my mind there are two options to plotting the data while more data is still being captured:

  • Option 1: Use screen to log the data and have the Python matplotlib script read the partial logfile.
    • Question 1: What concerns are there if the Python script reads the logfile, while screen is still writing data to it?
  • Option 2: Switch from using screen to using pySerial. However, plotting the data during the test is a lower priority than simply capturing the data during the test. I can't afford for an exception in the plotting portion of the code to cause the data logging to fail. That's what's great about screen—it just dumps the data and doesn't try to do anything else.
    • Question 2: If I were to switch to pySerial, could I run two threads to reduce the chance that the plotting portion of the code doesn't impact the data capture code? Does this buy me anything?

Question 3: Is there a better option that I haven't thought of?


Both option 1 and 2 will work, but oh boy, in the name of all things good, avoid using threads for this! You'll end up with the worst of both worlds: locking problems, and an exception in the graphing thread will kill the whole program (including the logging thread) anyway. As someone else mentioned, using two separate processes for this is fine. screen is a bit of an odd choice of tools for this purpose, as is writing code by hand in python. I'd just rewrite the talk2controller script as this trivial one:

stty -F /dev/tty.KeySerial1 19200 raw
cat </dev/tty.KeySerial1 >logfile

(You could also use >>logfile if you want each run of the script to append to the file, rather than rewriting it from scratch.)

The other question is about whether it's okay to have a program reading from the file as long as someone else is writing to it. A more specific version of this question is: what if a line of the log is half-written at the time you try to read it?

The answer is: you're allowed to do this, but you're right, you can't guarantee that a line won't be half-written at the time you read it. (If you write your own replacement for cat or screen you could actually make this guarantee by always writing to the file using os.read() instead of sys.stdout.write() or print.)

However, that guarantee isn't needed anyway. You only need to be careful when reading the file and you'll never have a problem. Essentially, an incomplete line is just one that doesn't end with a \n newline character. Thus:

for line in open('logfile'):
    if not line.endswith('\n'): break
    ...handle valid line...

Since the \n character is the last thing written by each line of the log, you know for sure that if you read a \n character, everything before it was written correctly.


I think Option 1 is totally feasible because you can easily have Python "tail" the logfile in a read-only pipe so that no harm is done to it while screen is still writing to it. While tailing the file, you can perform a specified action any time a new log event is detected in the log file.

If you are curious and would like to see some working code, a personal project of mine utilizes this functionality. The project is called thrasher-logdrop and the guts are logdrop.py. The basic flow is:

  • Tail a file with do_tail()
  • Watch for log events with tail_lines()
  • Perform an action on events with handle_line()


I'd say option 2 is the way to go. You have complete control over what you do with each byte of input, as you receive it. You can have a very simple Python script which simply writes the data to disk as it reads it. Your plotting code can run in an entirely separate process created by fork()ing the first. To get the data from one to the other, you can either (a) have the first process also write to a socketpair() or other IPC mechanism; or (b) configure the output file object to be line-buffered -- causing it to explicitly sync after every full line is written -- and monitor it for new content in the second process.

The problem with option 1 is that you have no control over screen's buffering behavior. You can monitor its logfile for new content, but your logging code needs to be prepared to handle both incomplete lines and large chunks of data all at once. Depending on the exact buffering behavior, you might not even see any data at all until the screen process exits!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜