Efficient Matrix decomposition into square submatrices in C++
I have implemented a Matrix datatype in C++ by using 1D datatype and wrapping it into rows and columns. Now, I want to have this possibility to c开发者_开发问答reate square/blocked sub-matrices from this time and I want to do it in-memory.
The problem is that I want some of these sub-matrices to be transferable to GPU memory and can process them there in parallel. This is for example, useful for Matrix Multiplication. As these submatrices are not aligned in main-memory, copying them to device memory as a single unit looks impossible without creating separate copy? I want to have this direct GPU sub-matrix copy mapping to CPU-original matrix for updation and efficiency purpose. I don't know about exact partitioning in advance.
Do someone has some idea how can I achieve it possibly?
Just a reminder, matrix needs to be partitioned in blocks and not row-wise which will be relatively easy in C/C++.
If the required sub-matrices are known at the time the 'master' matrix is created, and if they form a partition of the master, it's possible to create a composite matrix class somewhat like this:
// supposing an IMatrix<T> interface (pure virtual members only) class
template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;
   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }
   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};
The 'PlainMatix' can be organized in a memory-efficient way.
If your matrices' dimensions are powers of 2, you can store them in host memory in z-order. This way, you just need the start- and end-index of a submatrix to copy it with one call to cudaMemcpy.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论