numpy count number of clusters in matrix
In a symmetric numpy matrix with only 0's and 1's, is开发者_如何学Python there a method to count the number of "connecting clusters of 1's"?
For example the following numpy matrix:
np.array([[0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0]]
))
has two clusters of connecting 1's:
You can use scipy.ndimage.label
.
If you want to identify 4 clusters, use:
from scipy.ndimage import label
_, n = label(a)
print(n)
Output: 4
The default structuring element (kernel) is excluding the diagonals:
[[0,1,0],
[1,1,1],
[0,1,0]]
Thus, if you consider that you have two clusters (connected by the diagonal), change the default kernel:
from scipy.ndimage import label
kernel = np.ones((3, 3))
_, n = label(a, structure=kernel)
print(n)
Output: 2
Used input:
a = np.array([[0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0]])
I don't think there is a straightforward numpy-only way, or a provided function. You would have to use an algorithm for that.
The typical algorithm for this problem is called connected-component labelling, and it is used in image processing. The algorithm is described on Wikipedia: https://en.wikipedia.org/wiki/Connected-component_labeling
import numpy as np
n = 50
tmp = np.random.rand(n, n)
a = (tmp + tmp.T > 1).astype(int) # symmetric matrix
visited = np.full((n, n), fill_value=False)
cnt = 0
def paint(i, j, n):
if not ((0 <= i < n) and (0 <= j < n) and (i <= j)):
return
if visited[i, j] or a[i, j] == 0:
return
visited[i, j] = True
paint(i + 1, j, n) # dfs
paint(i - 1, j, n)
paint(i, j + 1, n)
paint(i, j - 1, n)
for i in range(n):
for j in range(i, n):
if (not visited[i, j]) and a[i, j] == 1:
paint(i, j, n)
cnt += 1
print(cnt)
My solution may raise RecursionError
when there are lots of one
s in the matrix and n
is large.
精彩评论