开发者

Python inside GNU Screen eventually becomes idle if Screen is dettached

I have a python script which uses multiprocessing and subprocess to launch multiple external commands in parallel with different arguments. The code can be found here.

For convenience I launch开发者_运维百科 this script inside a GNU Screen session. The machine where this script is running has 12 processors which are idle until processes become active.

Each of the processes takes between a few hours to a couple of days to run hence I often disconnect from the machine and detach the screen session.

However, recently I've noticed a behavior which I never experienced before. On several occasions I've returned to the machine to find it idle with a load of zero. If I get a list of active processes either via ps ux or top I can still find the script (and the subprocesses) on the list of processes. I then reattach the screen session to check the state of the program and immediately a new batch of processes is sent to the queue and the load of the system goes back to 12 in a matter of seconds. Note that I did absolutely nothing to the script other than reattaching the screen session.

I've installed a monitoring tool on the system and what happens is that some processes finish after a certain time and no new processes are launched. So the system is active until subprocesses are busy and becomes idle as soon as no more jobs are released from the queue.

So my question is, does anyone know of any reason that explains this behavior?

EDIT: After a year or so, this problem is no longer reproducible, either some patch on screen or python itself. I'm accepting the answer as it provided good directions for testing.


I can't explain the reason for what you are seeing. However, I do have an idea of what you can try next.

  1. Try piping the output of the script to: | tee out.txt If that has no effect, try...
  2. Run screen on another [hop] host. From there SSH into your worker host. Run your script in the non-emulated shell. Then feel free to disconnect and reconnect from your hop to check on the process. This should hide from the worker that screen is in anyway involved.

Please comment back with the results of these tests. That will give me more to go on.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜