Is it possible to create a 1million x 1 million matrix using numpy? [duplicate]
Possible Duplicate:
Python Numpy Very Large Matrices
I tried numpy.zeros((100k x 100k)) and 开发者_如何转开发it returned "array is too big". Response to comments: 1) I could create 10k x 10k matrix but not 100kx100k and 1milx1mil. 2) The matrix is not sparse.
We can do simple maths to find out. A 1 million by 1 million matrix has 1,000,000,000,000 elements. If each element takes up 4 bytes, it would require 4,000,000,000,000 bytes of memory. That is, 3.64 terabytes.
There are also chances that a given implementation of Python uses more than that for a single number. For instance, just the leap from a float to a double means you'll need 7.28 terabytes instead. (There are also chances that Python stores the number on the heap and all you get is a pointer to it, approximately doubling the footprint, without even taking in account metadata–but that's slippery grounds, I'm always wrong when I talk about Python internals, so let's not dig it too much.)
I suppose numpy
doesn't have a hardcoded limit, but if your system doesn't have that much free memory, there isn't really anything to do.
Does your matrix have a lot of zero entries? I suspect it does, few people do dense problems that large.
You can easily do that with a sparse matrix. SciPy has a good set built in. http://docs.scipy.org/doc/scipy/reference/sparse.html The space required by a sparse matrix grows with the number of nonzero elements, not the dimensions.
Your system probably won't have enough memory to store the matrix in memory, but nowadays you might well have enough terabytes of free disk space. In that case, numpy.memmap would allow you to have the array stored on disk, but appear as if it resides in memory.
However, it's probably best to rethink the problem. Do you really need a matrix this large? Any computations involving it will probably be infeasibly slow, and need to be done blockwise.
精彩评论