Matrix Comparison algorithm
If you have 2 Matrices of dimensions N*M. what is the best way to get the difference Rect?
Example:
2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
2 3 4 5 4 3 2 3 <---> 开发者_如何学C 2 3 2 3 2 3 2 3
2 3 4 5 2 3 2 3 2 3 2 3 2 3 2 3
2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
|
|
\ /
Rect([2,2] , [3,4])
4 5 4
4 5 2-> A (2 x 3 Matrix)
The best I could think of is to scan from Top-Left hit the point where there is difference. Then scan from Bottom Right and hit the point where there is a difference.
But In worst case, this is O(N*M). is there a better efficient algorithm? Or is there something I could do with how I represent them, so that you can apply a more efficient algorithm? And mind you, this Matrix can be very big.
No, there isn't a more efficient algorithm. For identical matrixes, you must scan all elements, so the algorithm is necessarily O(n*m)
.
"But In worst case, this is O(NM). is there a better efficient algorithm?" Probably not due to the dimension of the data being O(NM). Many matrix operations like this are of order MN because in the worst case there are MN elements that will all need to be checked in the case that the matrices are equal.
Looking at the average case is more interesting, if the difference box is necessarily a single rectangle within the whole matrix then I suspect you could get away with scanning less than all the elements on average.
Here's a quick though I had: keep track of the current element, call this XY
Start at top left corner so XY is the top corner now
Check that elements XY in both are equal, if not equal go to 3 else go to 4
If the elements are not then you have an element of the difference matrix. Record this element then search that row and column for the other elements (maybe something like binary search is fastest). Once the row/column is searched you have the coordinates of the edges. If elements are not equal move to 4.
Next step move XY diagonally one element down and one element right, then go to 2 again
once a diagonal is covered then you will need to test along the next diagonal ( I suspect that choosing a new diagonal that is the furthest away from the current diagonal will be the best, though I have no proof that this is the best choice) until all the elements are covered. Worst case this is still O(N*M) but it might be faster in the average case.
Essentially you are trying to one different element as fast as possible, so the aim is choosing the first element in such a way to minimize the expected value of the number of searches to find the first different element.
Like others have pointed out, O(N*M) is optimal.
I would just like to add that, when scanning through your matrix, you should keep its memory layout in mind. If the matrix is laid out in rows, it is best to scan horizontally. If it is laid out in columns, better scan vertically. This will lead to pretty much optimal caching behaviour.
Of course, this assumes that the difference in question is indeed in the form of a rectangle. If it's some other shape, and you want the bounding rectangle, you'll have to scan both rows and columns, no matter what.
I think the proposed algorithm is an optimal one. However I suggest you to try out BLAS library which is very efficient, and compare the performance. There is also Boost uBlas library which provide C++ interface and methods for Matrix expressions.
精彩评论