开发者

How do I write a Resque condition that says "if a process is running for longer than n seconds, kill it"?

I have a god/resque setup that spans a few worker servers. Every so often, the workers get jammed up by long polling connections and won't time out correctly. We have tried coding around it (but regardless of why it doesn't work), the keep-alive packets being sent down the wire won't let us time it out easily.

I would like certain workers (which I already have segmented out in their own watch blocks) to not be allowed to run for longer than a certain amount of time. In pesudocode, I am looking for a watch condition like the following (i.e. restart that worker if it takes longer than 60 sec to complete the task):

w.transition(:up, :restart) do |on|
  on.condition(:process_timer) do {|c|  c.greater_than = 60.seconds}
end

Any thoughts or pointe开发者_开发百科rs on how to accomplish this would be greatly appreciated.


require 'timeout'
Timeout::timeout(60) do
  ...
end


Although you have an answer I'll drop this here since I already made it:

class TimedThread
  def initialize(limit, &block)
    @thread = Thread.new{ block.call }
    @start = Time.now
    Thread.new do
      while @thread.alive?
        if Time.now - @start > limit
          @thread.kill
          puts "Thread killed"
        end
      end
    end.join
  end
end

[1, 2, 3].each_with_index do |secs, i|
  TimedThread.new(2.5){ sleep secs ; puts "Finished with #{i+1}" }
end


As it turns out, there is an example of how to do this in some sample resque files. It's not exactly what I was looking for since it doesn't add an on.condition(:foo), but it is a viable solution:

# This will ride alongside god and kill any rogue stale worker
# processes. Their sacrifice is for the greater good.

WORKER_TIMEOUT = 60 * 10 # 10 minutes

Thread.new do
  loop do
    begin
      `ps -e -o pid,command | grep [r]esque`.split("\n").each do |line|
        parts   = line.split(' ')
        next if parts[-2] != "at"
        started = parts[-1].to_i
        elapsed = Time.now - Time.at(started)

        if elapsed >= WORKER_TIMEOUT
          ::Process.kill('USR1', parts[0].to_i)
        end
      end
    rescue
      # don't die because of stupid exceptions
      nil
    end

    # Sleep so we don't run too frequently
    sleep 30
  end
end


Maybe take a look at resque-restriction? It doesn't appear to be under active maintenance but might do what you need.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜