Python library for job scheduling, ssh
I'd like to find a user-space tool (preferably in Python - barring that, in anything I could easily modify if it doesn't already do what I need it to) to replace a short script I've been using that does the two things below:
- polls less than 100 computers (Fedora 13, it so happens) for load, available memory, and if it looks like someone is using them
- selects good hosts for jobs, runs these jobs over ssh. These jobs are the execution of arbitrary command line programs which read and write to a shared filesystem - typically image processing scripts or similar - cpu, sometimes memory intensive tasks.
For example, using my current script, I can in a python prompt
>>> import hosts
>>> hosts.run_commands(['users']*5)
or from the command line
% hosts.py "users" "users" "users" "users" "users"
to run the command users
5 times (after finding 5 computers on which the command could be run by checking the cpu load and available memory on at least 5 computers from a config file). There should be no job server other than the script I just ran, and no worker daemons or processes on the computers that will run these commands.
I'd additionally like to be able to track the jobs, run jobs again on failure, etc., but these are extra features (very standard in a real job scheduler) that I don't actually need.
I've found good ssh libraries for Python, things like classh and PuSSH, which don't have the (very simple) load balancing features I'd like. On the other side of what I want is Condor or Slurm, as suggested by crispamares before I clarified I want something lighter. Those would be doing things the proper way, but from reading about them, they sounds like spinning them up开发者_StackOverflow in user space only when I need them would be annoying to impossible. This isn't a dedicated cluster, and I don't have root access on these hosts.
I'm currently planning to use a wrapper around classh with some basic polling of computers whenever I need to know how busy they are if I can't find something else.
There is fabric, I am surprised no one has not mentioned it.
Slurm is a powerful job scheduler that can be programmable in Python using PySlurm.
I don't know if it is harder than Condor to deploy. Also I don't know if it fits all your needs, but just in case, I write it down.
You could modify buildbot and twisted? This seems like a good way to go.
Have a look at func. I haven't used it beyond the "Hello, world" level, but I think it fits the bill perfectly for you.
I might be a little late: i like to recommend a look at python saga here.
I might be late for this question but I encountered the same issue recently and I am looking for a C/C+ library where I can do job scheduling and server load balancing for processing of image files over a cluster of servers. I will call the library from a GUI and monitor the status of the jobs.
I installed slurm and tried the commands, however utilizing it as a tool and possibly as a library seems rather difficult. Other options seem to provide job scheduling but no load balancing based on cpu utilization. I would appreciate any suggestions.
Best Regards
精彩评论