开发者

how to streamline (or parallelize with multiprocessing) some python 3.2 code with many loops?

I tried to educate myself about optimizing python and potentially parallelizing my problem. The links I used are listed at my question posted over at http://www.python-forum.org/pythonforum/viewtopic.php?f=18&t=28855

However, my application based on the networkx module still scales very badly with the size of the network, which I think is mainly a fault of my loops, not networkx-specific. With my little experience and focus to get this single job done, I would greatly appreciate if you could skim my code and pinpoint how the loops should be formulated instead or some other trick might be used. Any cursory comment will do, I'll try to take it from there.

Here are key and messy pieces which I expect to be the bottlenecks:

This piece projects a bipartite graph of prisoner-cell (bipartite) graph into a cellmate-cellmate graph.

# Dictionary for edge attributes in projected graph:
# 0: overlap_length
# 1: overlap_start
# 2: overlap_end
# 3: cell
# 4: level

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    for u in nodes:
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u])
        for v in nbrs2:
            for mutual_cell in set(B[u]) & set(B[v]):
                for uspell in B.get_edge_data(u,mutual_cell).values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B.get_edge_data(v,mutual_cell).values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = mutual_cell
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

This piece clears up the previous result by consolidating multiple links (spells in the same cell) between the same people.

# Dictionary for edge attributes in consolidated graph:
# 5: total_overlap
# 6: first_overlap
# 7: last_overlap
# 8: cell_list
# 9: spell_list

def consolidated_g_parallel(B):
    G = nx.Graph()
    G.add_nodes_from(B)
    for n in B.nodes_iter():
        for m in B.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            cells = []
            spells = []
            for ed in B.get_edge_data(m,n).values():
                omin = min(omin,ed[1])
                omax = max(omax,ed[2])
                otot += ed[0]
                cells.append(ed[3])
                spells.append((ed[1],ed[2]))
            G.add_edges_from([(n,m,{5: otot,6: omin,7: omax,8: cells,9: spells})])
    return G

This generates a new graph where all higher-order links (say, cellmates' cellmates, at least up to an order set by a parameter) of the previous result become direct links, which needs some brute-force hard work because of the important chronology of the links.

def consolidated_mdg_parallel(B):
    print('number of self-loops',B.number_of_selfloops())
    G = B.to_directed()
    for u,v,d in G.edges(data=True):
        d[4]=1
    for n in G.nodes_iter():
        sinks = nx.algorithms.boundary.node_boundary(B,[n])
        nsink = list(sinks)
        nsink.append(n)
        sources = nx.algorithms.boundary.node_boundary(B,nsink)
        while len(sinks) > 0 and len(sources) > 0 :
            for u in sources:
                for v in nx.algor开发者_运维技巧ithms.boundary.node_boundary(B,[u],sinks):
                    for edata in G[u][v].values():
                        for ed in G[v][n].values():
                            if edata[1] <= ed[2]:
                                oend = min(edata[2],ed[2])
                                ostart = max(edata[1],ed[1])
                                otot = min(ed[0],edata[0])
                                lvl = edata[4] + 1
                                G.add_edges_from([(u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})])
            nsink.extend(sources)
            sinks = list(set(sources)&set(G.predecessors(n)))
            sources = nx.algorithms.boundary.node_boundary(B,nsink)
    H = nx.MultiDiGraph()
    for n in G.nodes_iter():
        for m in G.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            for lvl in range(1,maxorder):
                for ed in G.get_edge_data(n,m).values():
                    if ed[4] == lvl:
                        otot += ed[0]
                if otot > 0:
                    H.add_edges_from([(n,m,{0: otot,4: lvl})])
                otot = 0.0
#                               print('Higher-order link added!')
    return H

This piece (not a function, of course) writes separate text files for each of the possible order of links, with the edges listed.

empty = ['nothing']
for i in range(1,maxorder):
    empty.append(set(J.nodes()))

# Write simple adjacency representation for spmat --- note losing track of length of spells and chronology!
for i in range(1,maxorder):
    targets[i].write('0 '+str(J.number_of_nodes())+' SHAPE KEY')

for e in J.edges_iter(data=True):
# (u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})
        print(e)
        if e[2][4] < maxorder:
            targets[e[2][4]].write('\n{} {} {}'.format(str(e[1]),str(e[0]),str(1/e[2][0])))
            try:
                empty[e[2][4]].remove(e[1])
            except:
                pass
for i in range(1,maxorder):
    for n in list(empty[i]):
        targets[i].write('\n'+str(n))
    targets[i].close()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜