开发者

NGINX + PHP5-FPM segfaults under high load

I have been dealing with this problem all day and it is driving me insane. All Google results and searches here lead to dead ends. I hope someone can work with me to provide a solution for myself and future victims. Here we go.

I am running a very popular website with over 3M page views a day. On average that is 34 page views per second, but more realistically, during peak hours, it gets to over 300 page views per second. Think of these as requests.

I am running a Ubuntu 10.04 64-bit server with 2 E5620 CPUs, 12GB RAM, and a Micron P300 6Gb/s SSD. During the peak hours the CPU and memory load is average (20-30% CPU and half of memory is used).

The software that powers this site is: NGINX, MySQL, PHP5-FPM, PHP-APC, and Memcached. Ok, now finally the meat of the post, here are my error logs. There a bunch of these errors logged.

/var/log/php5-fpm

Jul 20 14:49:47.289895 [NOTICE] fpm is running, pid 29373

Jul 20 14:49:47.337092 [NOTICE] ready to handle connections

Jul 20 14:51:23.957504 [ERROR] [pool www] unable to retrieve process activity of one or more child(ren). Will try again later.

Jul 20 14:51:41.846439 [WARNING] [pool www] child 29534 exited with code 1 after 114.518174 seconds from start

Jul 20 14:51:41.846797 [NOTICE] [pool www] child 29597 started

Jul 20 14:51:41.896653 [WARNING] [pool www] child 29408 exited on signal 11 SIGSEGV after 114.596706 seconds from start

Jul 20 14:51:41.897178 [NOTICE] [pool www] child 29598 started

Jul 20 14:51:41.903286 [WARNING] [pool www] child 29398 exited with code 1 after 114.605761 seconds from start

Jul 20 14:51:41.903719 [NOTICE] [pool www] child 29600 started

Jul 20 14:51:41.907816 [WARNING] [pool www] child 2开发者_StackOverflow9437 exited with code 1 after 114.601417 seconds from start

Jul 20 14:51:41.908253 [NOTICE] [pool www] child 29601 started

Jul 20 14:51:41.916002 [WARNING] [pool www] child 29513 exited with code 1 after 114.592514 seconds from start

Jul 20 14:51:41.916501 [NOTICE] [pool www] child 29602 started

Jul 20 14:51:41.916558 [WARNING] [pool www] child 29494 exited on signal 11 SIGSEGV after 114.597355 seconds from start

Jul 20 14:51:41.916873 [NOTICE] [pool www] child 29603 started

Jul 20 14:51:41.921389 [WARNING] [pool www] child 29502 exited with code 1 after 114.600405 seconds from start

/var/log/nginx/error.log 2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29578#0: *571695 readv() failed (104: Connection reset by peer) while reading upstream, client: 150.70.64.196, server: domain.com, request: "GET /page HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29581#0: *571050 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.157.66, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29581#0: *564892 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.161.214, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29585#0: *456171 readv() failed (104: Connection reset by peer) while reading upstream, client: 93.223.33.135, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29585#0: *471192 readv() failed (104: Connection reset by peer) while reading upstream, client: 74.90.33.142, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

2011/07/20 15:48:42 [error] 29580#0: *570132 readv() failed (104: Connection reset by peer) while reading upstream, client: 180.246.182.191, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

Finally, I want to point out that I did try to disable PHP-APC to see if it was a bug with the opt cacher, but the segfaults still persisted. I also have PHP5-SUHOSIN installed and I disabled it too, but the errors still keep happening.


This issue just happend to me.

PHP5-FPM was having segfaults on most of its children. In my case, we had 0bytes available on the harddisk. A quick log shredding stopped the segfaults.


2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"

thats just some problem with your config for your upstream server / router / client reset? of nginx dropped the request but running a site at 3 times the load you described i never saw that message, the requested resource isnt even handed to a php-fpm process, its a favicon

and for the php-fpm messages the children seem to stop after the 114 sec limit, is that a limit set by your php.ini file? seg faults in php often occur when using high memory, your php scripts could leak memory and will eventually reach the memory limit, having the php-fpm processes serve less requests helps in dealing with memory leaks


See my answer here that's related to your question (about nginx + magento and high load)

NGINX-FPM configuration settings for magento

Its not a direct answer per say, but it may help you configure your nginx + php-fpm to help eliminate the faults.


You are probably using suhosin Disable ths suhosin.ini under /etc/php5/fpm/conf.d and restart the php5-fpm service

Check the suhosin version and try to install another one.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜