How do I calculate the entropy of a graph?
I have a set of randomly generated 开发者_StackOverflowformal graphs, and I would like to calculate the entropy of each one. The same question in different words: I have several networks, and want to calculate the information content of each one.
Here are two sources containing formal definitions of graph entropy:
http://www.cs.washington.edu/homes/anuprao/pubs/CSE533Autumn2010/lecture4.pdf (PDF) http://arxiv.org/abs/0711.4175v1The code I am looking for takes a graph as input (as either an edge list or an adjacency matrix) and outputs a number of bits or some other measure of information content.
Because I can't find an implementation of this anywhere, I am setting out to code this from scratch based on the formal definitions. If anyone has already solved this problem and is willing to share the code, it would be wildly appreciated.
I ended up using different papers for definitions of graph entropy:
Information Theory of Complex Networks: On Evolution and Architectural Constraints
R.V. Sole and S. Valverde (2004)
and
Network Entropy Based on Topology Configuration and Its Computation to Random Networks
B.H. Wang, W.X. Wang and T. Zhou
The code to calculate each is below. The code assumes you have an undirected, unweighted graph with no self-loops. It takes an adjacency matrix as input and returns the amount of entropy in bits. It is implemented in R and makes use of the sna package.
graphEntropy <- function(adj, type="SoleValverde") {
if (type == "SoleValverde") {
return(graphEntropySoleValverde(adj))
}
else {
return(graphEntropyWang(adj))
}
}
graphEntropySoleValverde <- function(adj) {
# Calculate Sole & Valverde, 2004 graph entropy
# Uses Equations 1 and 4
# First we need the denominator of q(k)
# To get it we need the probability of each degree
# First get the number of nodes with each degree
existingDegrees = degree(adj)/2
maxDegree = nrow(adj) - 1
allDegrees = 0:maxDegree
degreeDist = matrix(0, 3, length(allDegrees)+1) # Need an extra zero prob degree for later calculations
degreeDist[1,] = 0:(maxDegree+1)
for(aDegree in allDegrees) {
degreeDist[2,aDegree+1] = sum(existingDegrees == aDegree)
}
# Calculate probability of each degree
for(aDegree in allDegrees) {
degreeDist[3,aDegree+1] = degreeDist[2,aDegree+1]/sum(degreeDist[2,])
}
# Sum of all degrees mult by their probability
sumkPk = 0
for(aDegree in allDegrees) {
sumkPk = sumkPk + degreeDist[2,aDegree+1] * degreeDist[3,aDegree+1]
}
# Equivalent is sum(degreeDist[2,] * degreeDist[3,])
# Now we have all the pieces we need to calculate graph entropy
graphEntropy = 0
for(aDegree in 1:maxDegree) {
q.of.k = ((aDegree + 1)*degreeDist[3,aDegree+2])/sumkPk
# 0 log2(0) is defined as zero
if (q.of.k != 0) {
graphEntropy = graphEntropy + -1 * q.of.k * log2(q.of.k)
}
}
return(graphEntropy)
}
graphEntropyWang <- function(adj) {
# Calculate Wang, 2008 graph entropy
# Uses Equation 14
# bigN is simply the number of nodes
# littleP is the link probability. That is the same as graph density calculated by sna with gden().
bigN = nrow(adj)
littleP = gden(adj)
graphEntropy = 0
if (littleP != 1 && littleP != 0) {
graphEntropy = -1 * .5 * bigN * (bigN - 1) * (littleP * log2(littleP) + (1-littleP) * log2(1-littleP))
}
return(graphEntropy)
}
If you have a weighted graph a good start would be to sort and count all the weights. Then you can use the formula -log(p)+log(2) (http://en.wikipedia.org/wiki/Binary_entropy_function) to determine the amount of bits to be needed for the code. Maybe this doesn't work because it's the binary entropy function?
You can use Koerner's entropy (= Shannon entropy applied to a graph). A good reference for the literature is here. Note however that the computation is in general NP-hard (for the stupid reason that you need to search of the all subsets of vertices).
精彩评论