numpy histogram with 3x3D arrays as indices
I've got 3x3D arrays which are the red, green and blue channels of a 3D rgb image. What is an elegant way in numpy to to create a histogram volume of the input channels?
The operation would be equivalent to
""" assume R, G and B are 3D arrays and output is a 3D array filled with zeros """
for x开发者_如何学Go in x_dim:
for y in y_dim:
for z in z_dim:
output[ R[x][y][z] ][ G[x][y][z] ][ B[x][y][z] ] += 1
This code too slow for large images. Can numpy improve the efficiency of the above algorithm?
You can do it using numpy.histogramdd
but, as you say, the method proposed by @jozzas won't work. What you have to do is flatten each of your three 3D arrays and then combine them into a 2-d array of dimensions (x_dim*y_dim*z_dim, 3)
, which you pass to histogramdd
. The fact that your original data are 3D is a red herring, since the spatial information is irrelevant to calculating the histogram.
Here is an example using random data in the channel cubes:
import numpy
n = 400 # approximate largest cube size that works on my laptop
# Fill channel cubes with random 8-bit integers
r = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
g = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
b = numpy.random.randint(256, size=(n,n,n)).astype(numpy.uint8)
# reorder data into for suitable for histogramming
data = numpy.vstack((r.flat, g.flat, b.flat)).astype(numpy.uint8).T
# Destroy originals to save space
del(r); del(g); del(b)
m = 256 # size of 3d histogram cube
hist, edges = numpy.histogramdd(
data, bins=m, range=((-0.5,255.5),(-0.5,255.5),(-0.5,255.5))
)
# Check that it worked
assert hist.sum() == n**3, 'Failed to conserve pixels'
This does use a lot more memory than you would expect because histogramdd
seems to be using 64-bit floats to do its work, even though we are sending it 8-bit integers.
Assuming 8-bit channels, the 3-tuple of integers (R,G,B) can be thought of as a single number in base 256: R*256**2 + G*256 + B
. Thus we can convert the 3 arrays R,G,B into a single array of "color values" and use np.bincount
to produce the desired histogram.
import numpy as np
def using_bincount(r,g,b):
r=r.ravel().astype('int32')
g=g.ravel().astype('int32')
b=b.ravel().astype('int32')
output=np.zeros((base*base*base),dtype='int32')
result=np.bincount(r*base**2+g*base+b)
output[:len(result)]+=result
output=output.reshape((base,base,base))
return output
def using_histogramdd(r,g,b):
data = np.vstack((r.flat, g.flat, b.flat)).astype(np.uint8).T
del(r); del(g); del(b)
hist, edges = np.histogramdd(
data, bins=base, range=([0,base],[0,base],[0,base])
)
return hist
np.random.seed(0)
n = 200
base = 256
r = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
g = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
b = np.random.randint(base, size=(n,n,n)).astype(np.uint8)
if __name__=='__main__':
bhist=using_bincount(r,g,b)
hhist=using_histogramdd(r,g,b)
assert np.allclose(bhist,hhist)
These timeit results suggest using_bincount is faster than using_histogramdd, perhaps because histogramdd is built for handling floats and bins which are ranges, while bincount is solely for counting integers.
% python -mtimeit -s'import test' 'test.using_bincount(test.r,test.g,test.b)'
10 loops, best of 3: 1.07 sec per loop
% python -mtimeit -s'import test' 'test.using_histogramdd(test.r,test.g,test.b)'
10 loops, best of 3: 8.42 sec per loop
You can use numpy's histogramdd
to compute the histogram of an n-dimensional array. If you don't want a histogram for each 2d slice, be sure to set the bins
for that dimension to 1.
To get the overall histogram, you could compute them individually for the R, G and B channels and then take the maximum value of the three for each position.
精彩评论