开发者

php cli script hangs with no messages

I've written a PHP script that runs via SSH and nohup, meant to process records from a database and do stuff with them (eg. process some images, update some rows).

It works fine with small loads, up to maybe 10k records. I have some larger datasets that process around 40k开发者_开发技巧 records (not a lot, I realize, but it adds up to a lot of work when each record requires the download and processing of up to 50 images).

The larger datasets can take days to process. Sometimes I'll see in my debug logs memory errors, which are clear enough-- but sometimes the script just appears to "die" or go zombie on me. My tail of the debug log just stops, with no error messages, the tail of the nohup log ends with no error, and the process is still showing in a ps list, looking like this--

26075 pts/0 S 745:01 /usr/bin/php ./import.php

but no work is getting done.

Can anyone give me some ideas on why a process would just quit? The obvious things (like a php script timeout and memory issues) are not a factor, as far as I can tell.

Thanks for any tips

PS-- this is hosted on a godaddy VDS (not my choice). I am sort of suspecting that godaddy has some kind of limits that might kick in on me despite what overrides I put in the code (such as set_time_limit(0);).


Very likely the OOM killer. If you really , really really want to stay out of its reach, as root, have your process write -17 to /proc/self/oom_adj. Caution: The kernel usually knows better. Evading the OOM killer can actually cripple the same RDBMS that you are trying to query. What a vicious cycle that would be :)

You probably (instead) want to stagger queries based on what you read from /proc/loadavg and /proc/meminfo. If you increase loads or swap exponentially, you need to back off, especially as a background process :)

Additionally, monitor IOWAIT while you run. This can be averaged from /proc/stat when compared with the time the system booted. Note it when you start and as you progress.

Unfortunately, the serial killer known as the OOM killer does not maintain a body count that is accessible beyond parsing kernel messages.

Or, your cron job keeps hitting its ulimited amount of allocated heap. Either way, your job needs to back off when appropriate, or prevent its own demise (as noted above) prior to doing any work.

As a side note, you probably should not be doing what you are doing on shared hosting. If its that big, its time to get a VPS (at least) where you have some control over what process gets to do what.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜