How can I remove a column from a sparse matrix efficiently?

2022-12-21 15:51 问答作者：

If I am using the sparse.lil_matrix format, how can I remove a column from t开发者_如何学编程he matrix easily and efficiently?

Much simpler and faster. You might not even need the conversion to csr, but I just know for sure that it works with csr sparse matrices and converting between shouldn't be an issue.

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])

I've been wanting this myself and in truth there isn't a great built-in way to do it yet. Here's a way to do it. I chose to make a subclass of lil_matrix and add the remove_col function. If you want, you can instead add the removecol function to the lil_matrix class in your lib/site-packages/scipy/sparse/lil.py file. Here's the code:

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError('column index out of bounds')

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

I have tried it out and don't see any bugs. I certainly think that it is better than slicing the column out, which just creates a new matrix as far as I know.

I decided to make a removerow function as well, but I don't think that it is as good as removecol. I'm limited by not being able to remove one row from an ndarray in the way that I would like. Here is removerow which can be added to the above class

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError('row index out of bounds')

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

Perhaps I should submit these functions to the Scipy repository.

For a sparse csr matrix (X) and a list of indices to drop (index_to_drop):

to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))    
new_X = X[:,to_keep]

It is easy to convert lil_matrices to csr_matrices. Check tocsr() in lil_matrix documentation

Note however that going from csr to lil matrices using tolil() is expensive. So, this choice is good when you do not require to have your matrix in lil format.

I'm new to python so my answer is probably wrong, but I was wondering why something like the following won't be efficient?

Lets say your lil_matrix is called mat and that you want to remove the i-th column:

mat=hstack( [ mat[:,0:i] , mat[:,i+1:] ] )

Now the matrix will turn to a coo_matrix after that but you can turn it back to lil_matrix.

Ok, I understand that this will have to create the two matrices inside the hstack before it does the assignment to the mat variable so it would be like having the original matrix plus one more at the same time but I guess if the sparsity is big enough then I think there shouldn't be any memory problems (since memory (and time) is the whole reason of using sparse matrices).


def removecols(W, col_list):
        if min(col_list) = W.shape[1]:
                raise IndexError('column index out of bounds')
        rows = W.rows
        data = W.data
        for i in xrange(M.shape[0]):
            for j in col_list:
                pos = bisect_left(rows[i], j)
                if pos == len(rows[i]):
                        continue
                elif rows[i][pos] == j:
                        rows[i].pop(pos)
                        data[i].pop(pos)
                        if pos == len(rows[i]):
                                continue
                for pos2 in xrange(pos,len(rows[i])):
                        rows[i][pos2] -= 1
        W._shape = (W._shape[0], W._shape[1]-len(col_list))
        return W

Just rewrote your code to work with col_list as input - maybe this will be helpful for somebody.

By looking at the notes for each sparse matrix, specifically in our case is csc matrix it has the following advantages as listed in the documentation [1]

efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
efficient column slicing
fast matrix vector products (CSR, BSR may be faster)

If you have the column indices you want to remove, just use slicing. For removing rows use csr matrix since it is efficient in row slicing

继续阅读：algebra matrix numpy python scipy

How can I remove a column from a sparse matrix efficiently?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？