Efficient Matrix decomposition into square submatrices in C++

2023-02-10 20:27 问答作者：

I have implemented a Matrix datatype in C++ by using 1D datatype and wrapping it into rows and columns. Now, I want to have this possibility to c开发者_开发问答reate square/blocked sub-matrices from this time and I want to do it in-memory.

The problem is that I want some of these sub-matrices to be transferable to GPU memory and can process them there in parallel. This is for example, useful for Matrix Multiplication. As these submatrices are not aligned in main-memory, copying them to device memory as a single unit looks impossible without creating separate copy? I want to have this direct GPU sub-matrix copy mapping to CPU-original matrix for updation and efficiency purpose. I don't know about exact partitioning in advance.

Do someone has some idea how can I achieve it possibly?

Just a reminder, matrix needs to be partitioned in blocks and not row-wise which will be relatively easy in C/C++.

If the required sub-matrices are known at the time the 'master' matrix is created, and if they form a partition of the master, it's possible to create a composite matrix class somewhat like this:

// supposing an IMatrix<T> interface (pure virtual members only) class

template< typename T >
struct CompositeMatrix : public IMatrix<T> {
   typedef std::vector<PlainMatrix<T>*> tMatrices;

   tMatrices submatrices;
   T& element( size_t row, size_t column ) {
       return findsubmatrix( row, column )->element( row, column );
   }

   // find algorithm implementing 'chain of responsibility-like' pattern.
   PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) {
     for( tMatrices::iterator it = submatrices.begin()
        ; it != submatrices.end()
        ; ++it)
     {
        if( it->contains( row,col ) ) return *it;            
     }
     return NULL;
   }
};

The 'PlainMatix' can be organized in a memory-efficient way.

If your matrices' dimensions are powers of 2, you can store them in host memory in z-order. This way, you just need the start- and end-index of a submatrix to copy it with one call to cudaMemcpy.

继续阅读：c gpgpu gpu stl

Efficient Matrix decomposition into square submatrices in C++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？