What is the fastest iteration over a large network instance of DiGraph networkx?

2023-02-05 05:54 问答作者：

I'm writing a class which inherits from DiGraph.py from the open source networkx package in python.

In some method in my class, I need to search for nodes with certain degr开发者_如何学Cees (outdegrees or indegrees for directed graph) and return them.

This class is to be used with a data mining project\natural language processing , its going to be used on extremely large networks. what I need is a fast implementation of the method described (return list of nodes with a certain out degree or certain in degree).

There a couple of things already defined in the super class: 1. a method network.outdegree() : returns a dictionary with node keys and outdegree values.

{'school': 4, 'middle school': 0, 'university': 0, 'commercial': 0, 'private': 5, 'institution': 2, 'high school': 0, 'college': 0, 'elementary school': 0, 'central': 0, 'company': 0, 'public': 3, 'bank': 2}

a method which is

network.out_degree_iter()

<generator object out_degree_iter at 0x02EEB328>

I don't know how to use this method, If someone can explain how that is used, i will be grateful.

3.I have an attribute network.nodes which is a list of all nodes in my network.

Question: I can iterate over all the nodes and return the ones with outdegree 2 for example, by doing a list comprehension on network.nodes, or I can iterate over my dictionary and return a list of nodes with values 2, or maybe use the out_degree_iter() which I don't know how is that used or how is that different from iterating over dictionary items in a for loop ( for k,v in dict.iteritems())?? Which one of these would be faster for a very large network of nodes and edges and WHY??

Thanks

The simplest way is to use the out_degree_iter() method with a list comprehension as you proposed. Here is how:

import networkx as nx
G=nx.DiGraph(nx.gnp_random_graph(1000,0.001))
t1=[n for n,k in G.out_degree_iter() if k==2

The fastest way requires accessing the internal data structure:

t2=[n for n,nbrs in G.succ.items() if len(nbrs)==2]

For in-degree us in_degree_iter() and G.pred.items().

Here are some timings

In [41]: %timeit t1=[n for n,k in G.out_degree_iter() if k==2]
1000 loops, best of 3: 368 us per loop

In [42]: %timeit s2=[n for n,nbrs in G.succ.items() if len(nbrs)==2]
1000 loops, best of 3: 198 us per loop

Iterators are better for large graphs because you don't construct a copy of the dictionary. How about something like this:

list_of_2 = []
for g in G.out_degree_iter():
    if g[1]==2:
        list_of_2.append(g[0])

Or,

list_of_2 = map(lambda x:x[0],filter(lambda x:(x[1]==2),G.out_degree_iter()))

继续阅读：algorithm iteration networkx python

What is the fastest iteration over a large network instance of DiGraph networkx?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？