Posting a large body crashes our stack -- but only in production mode?

2022-12-08 00:25 问答作者：

Our Rails application has two environments that we deploy to servers a Staging environment and the default Production environment.

The staging.rb file is a copy of the production.rb form the config/environments folder. The difference between the two is whiny nils is set to true:

config.whiny_nils = true

Since the rails application mostly used for it's API, we ran it on one of our internal staging servers for the developers to work with. This worked, without a hitch for almost 4 months. When the time came to transfer to our production server, the stack began to crash consistently whenever a POST or PUT came in with a large (sometimes VERY VERY large) body. When testing between the two servers the same requests were handled perfectly by the staging server.

The most frustrating part of the crashes/hangs were the lack of logs or tracing down where in the stack (nginx, phusion passenger, ruby 1.9 patch level 243, rails 2.3.4) where the crash was happening. Nothing appeared in the nginx error log, rails log, or anywhere that we could find. Since we were running the production server with updated versions of nginx, passenger, and ruby(higher patch level then staging, but still 1.9) we began reverting each component one at a time, even going so far as transferring all of the executables and support files (basically everything we had installed in /usr/local) to the production machine to no avail. Just as we were about to wipe the machine and try every step again someone suggested switching the production machine to the "staging" environment . . . and like magic, issue resolved!

Wanting to know what could have possibly caused the error, we began combing through the rails core, our own code, and all of our plugins looking for some clue as to what could possibly causing such a massive hang/crash in the production environment, once again to no avail.

The only clue I could find was the behavior. When testing "on" the app(one of the pages the rails app actually serves up) I would crash the app by sending the request, and then after frequent refreshes (usually 3-4) I would be able to produce an error from the Nginx log and eventually the app would begin to handle requests again. The error is as follows:

    Error during failsafe response: incompatible character encodings: UTF-8 and ASCII-8BIT
    2009/10/09 17:52:40 [error] 8691#0: *88 upstream prematurely closed connection while reading response header from upstream, client: *my ip address*, server: myapp.mydomain.com, request: "GET /api/sections/4/edit        HTTP/1.1", upstream: "passenger://unix:/tmp/passenger.8677/master/helper_server.sock:", host: "myapp.mydomain.com"
    *** Exception NoMethodError in application (undefined method `each' for nil:NilClass) (process 8703): from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/rack/request_handler.rb:95:in `process_request'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_request_handler.rb:206:in `main_loop'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:376:in `start_request_handler'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:334:in `block in handle_spawn_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/utils.rb:182:in `safe_fork'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:332:in `handle_spawn_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in `main_loop'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:195:in `start_synchronously'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:162:in `start'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:213:in `start'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:261:in `block (2 levels) in spawn_rails_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:126:in `lookup_or_add'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:255:in `b开发者_JS百科lock in spawn_rails_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:80:in `block in synchronize'
    from :8:in `synchronize'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:79:in `synchronize'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:254:in `spawn_rails_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:153:in `spawn_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:286:in `handle_spawn_application'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in `main_loop'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:195:in `start_synchronously'
    from /usr/local/lib/ruby/gems/1.9.1/gems/passenger-2.2.4/bin/passenger-spawn-server:61:in `'

Normally when a character encoding error arises, my first turn is to ruby 1.9 . . . however as you can tell from my testing, it was the same version on both machines!

After all of that, I guess I am wondering . . . would anyone have any ideas what is going on? Obviously we can run our app in staging for the time being, but I worry I may have found something deeper that will have to be addressed. Does have any ideas on the next place I should look for where this is happening?

Our setup: Mac OS X Server: 10.6.1,

Rails 2.3.4,

Ruby 1.9p243,

Nginx 0.8.17,

Passenger 2.2.5

Our Required Gems:

environment.rb

daemons

rmagick

test.rb

rspec

rspec-rails

factory-girl

rack-test

Our installed plugins:

acts-as-dag (active record plugin for creating directed-acyclic-graphs)

daemon_generator

globalize2

no-peeping-toms (for testing)

thinking-sphinx

UPDATE (In response to khelll):

I have attempted to add the config.whiny_nils = true to the production environment, however the crash still occurs.

Also, I went back to our staging server and set the environment to "Production" . . .same crash!

Some clarification on what I meant by a "large" request body. One of the POST/PUT's that would consistently crash the application was approximately 20,000 characters (of json). Since the API is consistently used throughout the day with small PUTS/POSTS and remained up, but only crashed/hung when these larger requests were made, I assumed the two were connected.

As far as Rack/Ruby 1.9 is concerned. Due to the amount of information out there about Rack and 1.9, I did upgrade our Rack gem to the latest in the git repository(which supposedly fixed some of the 1.9 issues). I had read about a significant amount of difficulty regarding rewindable_input, ruby 1.9 and more… however since I wasn't getting the rewindable_input error that I had experienced with my other 1.9 app, I did assume that this was a different problem. Also, I ruled out Rack when changing the rails environment resolved the issue (when I searched through the Rack source code, there didn't seem to be any environment specific methods that would be causing the error).

Hope this helps!

UPDATE in response to pauliephonic

There are no messages hitting the rails logs at all (which actually prompted me to search for some time in our web stack for the issue). My clue that a crash/hang is occurring is that once the large request is made, the application returns nothing but 500 errors on every request, however those 500 errors do not show up in the Rails logs.

Our database configuration is identical (we were using mysql cluster, so it was literally identical, now using local mysql database for the time being, but have confirmed the error exits regardless of the database used)

As far as multiple bytes/unicode is concerned . . . we are working in an internationalized app . . . however I don't think the way rails handles unicode changes between production and others right? As I stated above, this happened on a POST or PUT. The way I would test during my debugging was to go to the same edit pages of one of my large, heavily nested models and just try to "save" it. This would crash the app in production, but not crash the app in staging. Every time I tested the same characters, same content, same button, same behavior . . . different response based on environment. I couldn't even pepper puts statements everywhere in my code because (it appeared that) requests weren't making it to the rails application. I was getting no error messages in my Rails logs or Nginx error logs (save the one I posted on multiple refreshes).

What I can understand is that config.whiny_nils is making the whole difference. Were you I would look at activesupport/lib/active_support/whiny_nils.rb file (which looks so simple) and try from there to play around it to understand what's making the difference. I believe it's something related to the type of exception you are getting in production environment, which might not be raised when using whiny_nils.

I believe that you need to give more details also regarding "Posting a large body crashes our stack" part, cause it might be a problem with Rack and Ruby 1.9

I would try to run the app in production on apache/passenger, see if the problem is specific to nginx/passenger

From the look of things, that error is Ruby being persnickety about encodings. You've got a string that it thinks is UTF-8, and you're treating it as raw bytes, probably correctly. You need to identify the problematic string and call buggy_string.force_encoding(Encoding::ASCII_8BIT) on it. But hell if I know which piece of code is operating on that string or why it happens only in production. I wouldn't be surprised to discover the problem is actually deep within the bowels of Passenger.

Regarding the difference between staging and production, it's almost certainly an issue with something that should be moved around as a byte string, and it's being treated as a character string. There's a ton of code that only happens in the production environment (like caching) and if any of that code does this, that's your problem.

The whiny_nils thing is likely irrelevant.

I had a similar issue for the last few months -- not frequently enough to really debug it until recently. In my case, telling Passenger where to put temp buffer files did the trick. What made it difficult to find was not only the lack of error messages in the log files but the fact that this buffer is used not only for multipart posts but any kind of large post body.

PassengerUploadBufferDir /tmp

继续阅读：passenger rack ruby ruby-on-rails

Posting a large body crashes our stack -- but only in production mode?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？