delayed_job stops running after some time in production

2023-01-27 17:40 问答作者：

In production, our delayed_job process is dying for some reaso开发者_如何学Gon. I'm not sure if it's crashing or being killed by the operating system or what. I don't see any errors in the delayed_job.log file.

What can I do to troubleshoot this? I was thinking of installing monit to monitor it, but that will only tell me precisely when it dies. It won't really tell me why it died.

Is there a way to make it more chatty to the log file, so I can tell why it might be dying?

Any other suggestions?

I've come across two causes of delayed_job failing silently. The first is actual segfaults when people were using libxml in forked processes (this popped up on the mailing list some time back).

The second is an issue to do with the 1.1.0 version of daemons that delayed_job relies on has a problem (https://github.com/collectiveidea/delayed_job/issues#issue/81), this can be easily worked around by using 1.0.10 which is what my own Gemfile has in it.

Logging

There is logging in delayed_job so if the worker is dying without printing an error it's usually because it's not throwing an exception (e.g. Segfault) or something external is killing the process.

Monitoring

I use bluepill to monitor my delayed job instances, and so far this has been very successful at ensuring that the jobs remain running. The steps to get bluepill running for an application are quite easy

Add the bluepill gem to your Gemfile:

 # Monitoring
  gem 'i18n' # Not sure why but it complained I didn't have it
  gem 'bluepill'

I created a bluepill config file:

app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
  (0...workers).each do |i|
    app.process("delayed_job.#{i}") do |process|
      process.working_dir = "#{app_home}/current"

      process.start_grace_time    = 10.seconds
      process.stop_grace_time     = 10.seconds
      process.restart_grace_time  = 10.seconds

      process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
      process.stop_command  = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"

      process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
      process.uid = "mi"
      process.gid = "mi"
    end
  end
end

Then in my capistrano deploy file I just added:

# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
  desc "Stop processes that bluepill is monitoring and quit bluepill"
  task :quit, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
    run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
  end

  desc "Load bluepill configuration and start it"
  task :start, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
  end

  desc "Prints bluepills monitored processes statuses"
  task :status, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged status"
  end
end

Hope this helps a little.

The most common case that I met for this problem is caused by the database issues(mysql connection errors or so). there's no logs by default.

so I suggest you use god to control your delayed_job ( you can see its log file! ) .

assuming you are using delayed_job with Rails4, you should:

1.install god gem : $gem install god

2.have this script file:

# filename: cache_cleaner.god
RAILS_ROOT = '/sg552/workspace/m-api-cache-cleaner'
God.watch do |w| 
  w.name = 'cache_cleaner'
  w.dir = RAILS_ROOT
  w.start = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 start"
  w.stop = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job stop"
  w.restart = "cd #{RAILS_ROOT} && RAILS_ENV=production bundle exec bin/delayed_job -n 5 restart"
  w.log = "#{RAILS_ROOT}/log/cache_cleaner_stdout.log"
  w.pid_file = File.join(RAILS_ROOT, "log/delayed_job.total.pid")
  # you should NEVER use this config settings: 
  # w.keepalive   (always comment it out! ) 
end

3.to start/stop/restart delayed_jobs, change your command from:

$ bundle exec bin/delayed_job -n 3 start

to:

$ god -c cache_cleaner.god -D  
$ god start/stop/restart cache_cleaner

refer to my personal blog: http://siwei.me/blog/posts/using-delayed-job-with-god

继续阅读：delayed-job ruby-on-rails

delayed_job stops running after some time in production

Logging

Monitoring

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Logging

Monitoring

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？