itertools product speed up
I use itertools.product to generate all possible variations of 4 elements of length 13. The 4 and 13 can be arbitrary, but as it is, I get 4^13 results, which is a lot. I need the result as a Numpy array and currently do the follo开发者_StackOverflow中文版wing:
c = it.product([1,-1,np.complex(0,1), np.complex(0,-1)], repeat=length)
sendbuf = np.array(list(c))
With some simple profiling code shoved in between, it looks like the first line is pretty much instantaneous, whereas the conversion to a list and then Numpy array takes about 3 hours. Is there a way to make this quicker? It's probably something really obvious that I am overlooking.
Thanks!
The NumPy equivalent of itertools.product()
is numpy.indices()
, but it will only get you the product of ranges of the form 0,...,k-1:
numpy.rollaxis(numpy.indices((2, 3, 3)), 0, 4)
array([[[[0, 0, 0],
[0, 0, 1],
[0, 0, 2]],
[[0, 1, 0],
[0, 1, 1],
[0, 1, 2]],
[[0, 2, 0],
[0, 2, 1],
[0, 2, 2]]],
[[[1, 0, 0],
[1, 0, 1],
[1, 0, 2]],
[[1, 1, 0],
[1, 1, 1],
[1, 1, 2]],
[[1, 2, 0],
[1, 2, 1],
[1, 2, 2]]]])
For your special case, you can use
a = numpy.indices((4,)*13)
b = 1j ** numpy.rollaxis(a, 0, 14)
(This won't run on a 32 bit system, because the array is to large. Extrapolating from the size I can test, it should run in less than a minute though.)
EIDT: Just to mention it: the call to numpy.rollaxis()
is more or less cosmetical, to get the same output as itertools.product()
. If you don't care about the order of the indices, you can just omit it (but it is cheap anyway as long as you don't have any follow-up operations that would transform your array into a contiguous array.)
EDIT2: To get the exact analogue of
numpy.array(list(itertools.product(some_list, repeat=some_length)))
you can use
numpy.array(some_list)[numpy.rollaxis(
numpy.indices((len(some_list),) * some_length), 0, some_length + 1)
.reshape(-1, some_length)]
This got completely unreadable -- just tell me whether I should explain it any further :)
The first line seems instantaneous because no actual operation is taking place. A generator object is just constructed and only when you iterate through it as the operating taking place. As you said, you get 4^13 = 67108864
numbers, all these are computed and made available during your list
call. I see that np.array takes only list or a tuple, so you could try creating a tuple out of your iterator and pass it to np.array to see if there is any performance difference and it does not affect the overall performance of your program. This can be determined only by trying for your usecase though there are some points which say tuple is slightly faster.
To try with a tuple, instead of list just do
sendbuf = np.array(tuple(c))
You could speed things up by skipping the conversion to a list:
numpy.fromiter(c, count=…) # Using count also speeds things up, but it's optional
With this function, the NumPy array is first allocated and then initialized element by element, without having to go through the additional step of a list construction.
PS: fromiter()
does not handle the tuples returned by product()
, so this might not be a solution, for now. If fromiter()
did handle dtype=object
, this should work, though.
PPS: As Joe Kington pointed out, this can be made to work by putting the tuples in a structured array. However, this does not appear to always give a speed up.
Let numpy.meshgrid do all the job:
length = 13
x = [1, -1, 1j, -1j]
mesh = numpy.meshgrid(*([x] * length))
result = numpy.vstack([y.flat for y in mesh]).T
on my notebook it takes ~2 minutes
You might want to try a completely different approach: first create an empty array of the desired size:
result = np.empty((4**length, length), dtype=complex)
then use NumPy's slicing abilities to fill out the array yourself:
# Set up of the last "digit":
result[::4, length-1] = 1
result[1::4, length-1] = -1
result[2::4, length-1] = 1j
result[3::4, length-1] = -1j
You can do similar things for the other "digits" (i.e. the elements of result[:, 2], result[:, 1], and result[:, 0]). The whole thing could certainly be put in a loop that iterates over each digit.
Transposing the whole operation (np.empty((length, 4**length)…)
) is worth trying, as it might bring a speed gain (through a better use of the memory cache).
Probably not optimized but much less reliant on python type conversions:
ints = [1,2,3,4]
repeat = 3
def prod(ints, repeat):
w = repeat
l = len(ints)
h = l**repeat
ints = np.array(ints)
A = np.empty((h,w), dtype=int)
rng = np.arange(h)
for i in range(w):
x = l**i
idx = np.mod(rng,l*x)/x
A[:,i] = ints[idx]
return A
精彩评论