开发者

Efficient Matrix decomposition into square submatrices in C++

I have implemented a Matrix datatype in C++ by using 1D datatype and wrapping it into rows and columns. Now, I want to have this possibility to c开发者_开发问答reate square/blocked sub-matrices from this time and I want to do it in-memory.

The problem is that I want some of these sub-matrices to be transferable to GPU memory and can process them there in parallel. This is for example, useful for Matrix Multiplication. As these submatrices are not aligned in main-memory, copying them to device memory as a single unit looks impossible without creating separate copy? I want to have this direct GPU sub-matrix copy mapping to CPU-original matrix for updation and efficiency purpose. I don't know about exact partitioning in advance.

Do someone has some idea how can I achieve it possibly?

Just a reminder, matrix needs to be partitioned in blocks and not row-wise which will be relatively easy in C/C++.


If the required sub-matrices are known at the time the 'master' matrix is created, and if they form a partition of the master, it's possible to create a composite matrix class somewhat like this:

// supposing an IMatrix<T> interface (pure virtual members only) class

template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;

   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }

   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};

The 'PlainMatix' can be organized in a memory-efficient way.


If your matrices' dimensions are powers of 2, you can store them in host memory in z-order. This way, you just need the start- and end-index of a submatrix to copy it with one call to cudaMemcpy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜