How do I clear stuck/stale Resque workers?
As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of secon开发者_JAVA技巧ds.
I'm not sure why they won't clear or how to manually remove them.
I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.
None of these solutions worked for me, I would still see this in redis-web:
0 out of 10 Workers Working
Finally, this worked for me to clear all the workers:
Resque.workers.each {|w| w.unregister_worker}
In your console:
queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"
Otherwise you can try to fake them as being done to remove them, with:
Resque::Worker.working.each {|w| w.done_working}
EDIT
A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.
You probably have the resque gem installed, so you can open the console and get current workers
Resque.workers
It returns a list of workers
#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]
pick the worker and prune_dead_workers
, for example the first one
Resque.workers.first.prune_dead_workers
Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).
Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}
I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350
Run this command wherever you ran the command to start the server
$ ps -e -o pid,command | grep [r]esque
you should see something like this:
92102 resque: Processing ProcessNumbers since 1253142769
Make note of the PID (process id) in my example it is 92102
Then you can quit the process 1 of 2 ways.
Gracefully use
QUIT 92102
Forcefully use
TERM 92102
* I'm not sure of the syntax it's either QUIT 92102
or QUIT -92102
Let me know if you have any trouble.
I just did:
% rails c production
irb(main):001:0>Resque.workers
Got the list of workers.
irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)
... where n is the zero based index of the unwanted worker.
I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.
Fix this using:
Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers
Make sure you restart Redis and your Resque workers.
Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:
"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."
Been used in production and works pretty well for me thus far.
Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.
Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }
I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.
I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku. Get dead worker ID from the list. Mine was
55ba6f3b-9287-4f81-987a-4e8ae7f51210:2
Run this command in redis directly.
del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"
You can monitor redis db to see what it's doing behind the scenes.
redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"
Second last line deletes the worker.
In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:
Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }
I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.
This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.
If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.
I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.
I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going
have a look at my script here: https://gist.github.com/jobwat/5712437
If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...
Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}
This avoids the problem as long as you have a resque version newer than 1.26.0:
resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work
Keep in mind that it does not let the currently running job finish.
If you use Docker, you can also use this command:
<id>
is the worker id.
docker stop <id>
docker start <id>
精彩评论