开发者

delayed_job stops running after some time in production

In production, our delayed_job process is dying for some reaso开发者_如何学Gon. I'm not sure if it's crashing or being killed by the operating system or what. I don't see any errors in the delayed_job.log file.

What can I do to troubleshoot this? I was thinking of installing monit to monitor it, but that will only tell me precisely when it dies. It won't really tell me why it died.

Is there a way to make it more chatty to the log file, so I can tell why it might be dying?

Any other suggestions?


I've come across two causes of delayed_job failing silently. The first is actual segfaults when people were using libxml in forked processes (this popped up on the mailing list some time back).

The second is an issue to do with the 1.1.0 version of daemons that delayed_job relies on has a problem (https://github.com/collectiveidea/delayed_job/issues#issue/81), this can be easily worked around by using 1.0.10 which is what my own Gemfile has in it.

Logging

There is logging in delayed_job so if the worker is dying without printing an error it's usually because it's not throwing an exception (e.g. Segfault) or something external is killing the process.

Monitoring

I use bluepill to monitor my delayed job instances, and so far this has been very successful at ensuring that the jobs remain running. The steps to get bluepill running for an application are quite easy

Add the bluepill gem to your Gemfile:

 # Monitoring
  gem 'i18n' # Not sure why but it complained I didn't have it
  gem 'bluepill'

I created a bluepill config file:

app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
  (0...workers).each do |i|
    app.process("delayed_job.#{i}") do |process|
      process.working_dir = "#{app_home}/current"

      process.start_grace_time    = 10.seconds
      process.stop_grace_time     = 10.seconds
      process.restart_grace_time  = 10.seconds

      process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
      process.stop_command  = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"

      process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
      process.uid = "mi"
      process.gid = "mi"
    end
  end
end

Then in my capistrano deploy file I just added:

# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
  desc "Stop processes that bluepill is monitoring and quit bluepill"
  task :quit, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
    run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
  end

  desc "Load bluepill configuration and start it"
  task :start, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
  end

  desc "Prints bluepills monitored processes statuses"
  task :status, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged status"
  end
end

Hope this helps a little.


The most common case that I met for this problem is caused by the database issues(mysql connection errors or so). there's no logs by default.

so I suggest you use god to control your delayed_job ( you can see its log file! ) .

assuming you are using delayed_job with Rails4, you should:

1.install god gem : $gem install god

2.have this script file:

# filename: cache_cleaner.god
RAILS_ROOT = '/sg552/workspace/m-api-cache-cleaner'
God.watch do |w| 
  w.name = 'cache_cleaner'
  w.dir = RAILS_ROOT
  w.start = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 start"
  w.stop = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job stop"
  w.restart = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 restart"
  w.log = "#{RAILS_ROOT}/log/cache_cleaner_stdout.log"
  w.pid_file = File.join(RAILS_ROOT, "log/delayed_job.total.pid")
  # you should NEVER use this config settings: 
  # w.keepalive   (always comment it out! ) 
end

3.to start/stop/restart delayed_jobs, change your command from:

$ bundle exec bin/delayed_job -n 3 start

to:

$ god -c cache_cleaner.god -D  
$ god start/stop/restart cache_cleaner

refer to my personal blog: http://siwei.me/blog/posts/using-delayed-job-with-god

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜