开发者

how can Python see 12 cpus on a cluster where I got allocated 4 cores by LSF?

I access a Linux cluster where resources are allocated using LSF, which I think is a common tool and comes from Scali (http://www.scali.com/workload-management/high-performance-computing). In an interactive queue, I asked for and got the maximum number of cores: 4. But if I check how many cpus does Python's multiprocessing module see, the number is 12, the number of physical cores the node I was allocated to has. It looks like the multiprocessing module has problems respecting the bounds that LSF should/would impose. Is this a problem in LSF or Python?

[lsandor@iliadaccess03 peers_prisons]$ bsub -Is -n 4 -q interact sh
Job <7408231> is submitted to queue <interact>.
<&开发者_Go百科lt;Waiting for dispatch ...>>
<<Starting on heroint5>>
sh-3.2$ python3
Python 3.2 (r32:88445, Jun 13 2011, 09:20:03) 
[GCC 4.3.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> 
>>> multiprocessing.cpu_count()
12


Not a problem, although your program should respect the amount of resources allocated to it by the queuing system, which may be considerably less than 100% as you have realized. I don't believe LSF has OS-level hooks to enforce compliance, nor probably should it.

In the past I've seen this handled with a wrapper script. One that that sets up a program and job simultaneously with the appropriate settings, then launches it.


A bit late to the party, but expanding on the answer of @Paddy3118, the span specification is not needed. Instead, the environment variable LSB_DJOB_NUMPROC holds the number of allocated cores. At least it does with the LSF version available to me (9.1.2).


If you submit to lsf using the -n option to state how many processors you want and then use request that the four processors are made available on the same host by using span like in the command below:

bsub -n 4 -R "span[hosts=1]" my_job

Then my_job is started with the following environment variables set which can be interrogated by your python script to set the number of sub-processes to start equal to the number assigned by LSF:

LSB_HOSTS= "hostA hostA hostA hostA"
LSB_MCPU_HOSTS="hostA 4" 

(Or should the number of sub-processes be the number of processes allocated by LSF - 1 to account for the python script launching the sub-processes :-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜