BLAS: gemm vs. gemv
Why does BLAS have a 开发者_运维技巧gemm
function for matrix-matrix multiplication and a separate gemv
function for matrix-vector multiplication? Isn't matrix-vector multiplication just a special case of matrix-matrix multiplication where one matrix only has one row/column?
Mathematically, matrix-vector multiplication is a special case of matrix-matrix multiplication, but that's not necessarily true of them as realized in a software library.
They support different options. For example, gemv
supports strided access to the vectors on which it is operating, whereas gemm
does not support strided matrix layouts. In the C language bindings, gemm
requires that you specify the storage ordering of all three matrices, whereas that is unnecessary in gemv
for the vector arguments because it would be meaningless.
Besides supporting different options, there are families of optimizations that might be performed on gemm
that are not applicable to gemv
. If you know that you are doing a matrix-vector product, you don't want the library to waste time figuring out that's the case before switching into a code path that is optimized for that case; you'd rather call it directly instead.
When you optimize gemv and gemm different techniques apply:
- For the matrix-matrix operation you are using blocked algorithms. Block sizes depend on cache sizes.
- For optimising the matrix-vector product you use so called fused Level 1 operations (e.g. fused dot-products or fused axpy).
Let me know if you want more details.
I think it just fits the BLAS hierarchy better with its level 1 (vector-vector), level 2 (matrix-vector) and level 3 (matrix-matrix) routines. And it maybe optimizable a bit better if you know it is only a vector.
精彩评论