CUBLAS - matrix addition.. how?

2023-02-20 01:39 问答作者：

I am trying to use CUBLAS to sum two big matrices of unknown size. I need a fully optimized code (if possible) so I chose not to rewrite the matrix addition code (simple) but using CUBLAS, in particular the cublasSgemm function which allows to sum A and C (if B is a unit matrix): *C = alpha*op(A)*op(B)+beta*c*

The problem is: C and C++ store开发者_C百科 the matrices in row-major format, cublasSgemm is intended (for fortran compatibility) to work in column-major format. You can specify whether A and B are to be transposed first, but you can NOT indicate to transpose C. So I'm unable to complete my matrix addition..

I can't transpose the C matrix by myself because the matrix is something like 20000x20000 maximum size.

Any idea on how to solve please?

cublasgeam has been added to CUBLAS5.0. It computes the weighted sum of 2 optionally transposed matrices

If you're just adding the matrices, it doesn't actually matter. You give it alpha, Aij, beta, and Cij. It thinks you're giving it alpha, Aji, beta, and Cji, and gives you what it thinks is Cji = beta Cji + alpha Aji. But that's the correct Cij as far as you're concerned. My worry is when you start going to things which do matter -- like matrix products. There, there's likely no working around it.

But more to the point, you don't want to be using GEMM to do matrix addition -- you're doing a completely pointless matrix multiplication (which takes takes ~20,000³ operations and many passes through memory) for an operatinon which should only require ~20,000² operations and a single pass! Treat the matricies as 20,000^2-long vectors and use saxpy.

Matrix multiplication is memory-bandwidth intensive, so there is a huge (factors of 10x or 100x) difference in performance between coding it yourself and a tuned version. Ideally, you'd change structures in your code to match the library. If you can't, in this case you can manage just by using linear algebra identities. The C-vs-Fortran ordering means that when you pass in A, CUBLAS "sees" A^T (A transpose). Which is fine, we can work around it. If what you want is C=A.B, pass in the matricies in the opposite order, B.A . Then the library sees (B^T . A^T), and calculates C^T = (A.B)^T; and then when it passes back C^T, you get (in your ordering) C. Test it and see.

继续阅读：blas cublas matrix

CUBLAS - matrix addition.. how?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？