Jobs are getting lost in ParallelPython?
I am submitting about 234 jobs (but my example contains only 50 for demonstration purpose) to my 20 node cluster using ParallelPython. I was expecting that it would queue and execute them but it seems to "lose" jobs and I am not understand where things are going wrong. When the script finishes, I am not able to see 50 files i.e. info_1, info_2 .... info_50 but rather I am seeing some random behavior. Any suggestions?
def readChecklist():
f = open('/home/username/twisted/pp-1.6.0/checklist', 'r')
checklist = [line.strip() for line in f]
return checklist
def processFile(num):
bl = readChecklist()
# pick a filename to write to
outfile = "info_" + str(num)
FILE = open(outfile, "a")
for i in range(num):
FILE.write(str(i)+"\n")
FILE.flush()
FILE.close()
return num
ppservers=("*",)
job_server = pp.Server(ppservers=ppservers)
inputs = range(50)
jobs = [(input, job_server.submit(processFile,(input,), (readCheckList,), ("os","math","time","sys","subprocess",))) for input in inputs]
for input, job in job开发者_运维技巧s:
print "Job: ", input, " is", job()
job_server.print_stats()
Output:
Job: 0 is True
Job: 1 is True
Job: 2 is True
Job: 3 is True
Job: 4 is True
Job: 5 is True
Job: 6 is True
Job: 7 is True
Job: 8 is True
Job: 9 is True
Job: 10 is True
Job: 11 is True
Job: 12 is True
Job: 13 is True
Job: 14 is True
Job: 15 is True
Job: 16 is True
Job: 17 is True
Job: 18 is True
Job: 19 is True
Job: 20 is True
Job: 21 is True
Job: 22 is True
Job: 23 is True
Job: 24 is True
Job: 25 is True
Job: 26 is True
Job: 27 is True
Job: 28 is True
Job: 29 is True
Job: 30 is True
Job: 31 is True
Job: 32 is True
Job: 33 is True
Job: 34 is True
Job: 35 is True
Job: 36 is True
Job: 37 is True
Job: 38 is True
Job: 39 is True
Job: 40 is True
Job: 41 is True
Job: 42 is True
Job: 43 is True
Job: 44 is True
Job: 45 is True
Job: 46 is True
Job: 47 is True
Job: 48 is True
Job: 49 is True
Time elapsed: 0.592607975006 s
Job execution statistics:
job count | % of all jobs | job time sum | time per job | job server
3 | 6.00 | 0.3226 | 0.107546 | x.x.x.x:abcd
3 | 6.00 | 0.2849 | 0.094970 | x.x.x.x:abcd
2 | 4.00 | 0.2420 | 0.121004 | x.x.x.x:abcd
3 | 6.00 | 0.3328 | 0.110927 | x.x.x.x:abcd
2 | 4.00 | 0.2314 | 0.115687 | x.x.x.x:abcd
2 | 4.00 | 0.2634 | 0.131683 | x.x.x.x:abcd
3 | 6.00 | 0.2827 | 0.094223 | x.x.x.x:abcd
2 | 4.00 | 0.2496 | 0.124812 | x.x.x.x:abcd
1 | 2.00 | 0.1701 | 0.170140 | x.x.x.x:abcd
3 | 6.00 | 0.3053 | 0.101758 | x.x.x.x:abcd
1 | 2.00 | 0.1334 | 0.133415 | x.x.x.x:abcd
3 | 6.00 | 0.2777 | 0.092561 | x.x.x.x:abcd
1 | 2.00 | 0.1152 | 0.115169 | x.x.x.x:abcd
1 | 2.00 | 0.1273 | 0.127294 | x.x.x.x:abcd
3 | 6.00 | 0.3345 | 0.111503 | x.x.x.x:abcd
1 | 2.00 | 0.1128 | 0.112782 | x.x.x.x:abcd
2 | 4.00 | 0.2636 | 0.131819 | x.x.x.x:abcd
8 | 16.00 | 0.4413 | 0.055163 | local
1 | 2.00 | 0.1905 | 0.190510 | x.x.x.x:abcd
3 | 6.00 | 0.2774 | 0.092473 | x.x.x.x:abcd
2 | 4.00 | 0.2197 | 0.109835 | x.x.x.x:abcd
Time elapsed since server creation 0.592818021774
List of files created: (One per job)
0
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
Ok my mistake! Just in case anyone else faces this issue, make sure your directory paths are absolute whether you are reading from a file or writing into a file... 5 hours of debugging :( but I learnt my lesson :)
精彩评论