uBLAS Slow Matrix-SparseVector Multiplication
I'm converting some of my own vector algebra code to use the optimized boost uBLAS library. However, when I tried to do a SymmetricMatrix-SparseVector multiplication I found it to be about 4x slower than my own implementation. The vector size is usually around 0-500 and about 70-80% entries are zero.
Here is my code
void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
{
compressed_vector<double> inVec (vectorLength, sparseLength);
for(int i = 0; i < sparseLength; i++)
{
inVec(sparseVectorIndexes[i]) = vectorIn[sparseVectorIndexes[i]];
}
vector<double> test = prod(inVec, matrix);
for(int i = 0; i < vectorLength; i++)
{
a[i] = test(i);
}
}
sparseVectorIndexes stores the indexes of the non-zero values of the input vector, vectorLength is the length of the vector, and sparseLength is the number of non-zeros in the vector. The matrix is sto开发者_StackOverflow社区red as a symmetric matrix symmetric_matrix<double, lower>
.
My own implementation is a simple nested loop iteration where matrix is just a 2D double array:
void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
{
for (int i = 0; i < vectorLength; i++)
{
double temp = 0;
for (int j = 0; j < sparseLength; j++)
{
int row = sparseVectorIndexes[j];
if (row <= i) // Handle lower triangular sparseness
temp += matrix[i][row] * vectorIn[row];
else
temp += matrix[row][i] * vectorIn[row];
}
a[i] = temp;
}
}
Why is uBLAS 4x slower? Am I not writing the multiplication properly? Or is there another library more suited to this?
EDIT: If I use a dense vector array instead then uBLAS is only 2x slower...
uBlas was not designed with performance as goal No 1 in mind. There are libraries which are significantly faster than uBlas. See e.g. http://eigen.tuxfamily.org/index.php?title=Benchmark
This pdf has quite a detailed comparison of various linear algebra libraries. I came across this in this answer from Computational Science Stack Exchange, which is possibly a better place for this sort of question.
Not sure if it is the cause of the slowdown (did you profile to get your 4x number?) but this loop could be slow:
for(int i = 0; i < vectorLength; i++)
{
a[i] = test(i);
}
If most of the time is spent processing the loops in your code then this extra loop could double the time (and have nothing to do with ublas). I would recommend using std::copy
instead:
std::copy(test.begin(), test.end(), a[0])
Most compilers should see that this is copying a double and do an optimal copy, which might fix your problem somewhat.
精彩评论