Optimizing extraction of data from a MATLAB matrix?
Given an n-dimensional matrix of values: what is the most efficient way of retrieving values by arbitrary indices (i.e. coordinates)?
E.g. in a random 5x5 matrix, if I want the values at (1,1) (2,3) an开发者_如何学运维d (4,5) what is the most efficient way of returning just the values at these coordinates?
If I provide these coordinates in a separate matrix for example is there a one line of MATLAB which can do the job? Something like:
x=rand(5,5);
y=[[1,1];[2,3];[4,5]];
z=x(y);
Except that doesn't work.
One caveat however, for various reasons I am unable to use linear indexing - the results must be returned using the original indices. And the size of these matrices is potentially very large so I don't want to use loops either.
If you're against using linear indexing and loops, the only other alternative, AFAIK, is logical indexing. But if y
always comes in the form you've suggested, you'll need to create a logical matrix from the indices specified in y
.
Could you explain why linear indexing is not allowed?
Anyway, if you want a really stupid answer (which is all I can provide with this much information):
z = diag(x(y(:,1),y(:,2)))
Of course, this will needlessly create a huge matrix and extract the diagonal elements (the ones you need) from it - but it gets it done in one line, etc.
EDIT: If the restriction is using linear indexing on the original data, then you can use linear indexing to create a logical matrix and index x
with that. E.g.
% Each element of L is only one byte
L = false(size(x));
% Create the logical mask
L(sub2ind(size(x),y(:,1),y(:,2))) = true;
% Extract the required elements
z = x(L);
Similarly, for a 3-dimensional matrix:
x = rand(3,3,3);
y = [1 1 1;2 2 2;3 3 3];
L = false(size(x));
L(sub2ind(size(x),y(:,1),y(:,2),y(:,3))) = true;
z = x(L);
Also, logical indexing is supposed to be faster than linear indexing, so apart from building the mask, you're in good shape.
why isn't sub2ind alone suitable for this problem? I don't see the need for the logical mask; e.g.
z = x(sub2ind(size(x),y(:,1),y(:,2)))
should work as well.
Coming to the party long after the music has stopped, but I couldn't help myself...
If you need "full" indexing because of a bug in the toolbox, and the toolbox is loading only a part of the matrix at one time, you might consider following along with the behavior of the toolbox. A big efficiency gain with large matrices is gained with two things
1) don't make copies of things that don't need to be copied; this includes, for example, creating a logical array of the size of the original matrix (although it's nominally "efficient", it takes one byte per element. If your matrix is too large to fit in memory all at once, even a matrix that is 1/8 the size is probably significant)
2) preserve memory coherence: access memory "in the same region", or find yourself slowed down by lots of disk swapping; even when everything fits in memory, preserving "cache coherence" can result in significant performance improvements. If you can access matrix elements in the order in which they are stored, things speed up considerably.
To address the first point, you need to look for a method that doesn't require creating a complete copy (so Jacob's answer would be out). To address the second, you need to sort your indices before accessing the matrix - in that way, any elements that can be accessed "from the same block of memory", will be.
The two techniques are combined in the following. I am assuming that numel(y) << numel(x)
- in other words you are only interested in looking at a relatively small number of elements of x
. If that's not the case, sorting the y vector would actually slow you down a lot:
x = rand(5,5);
y = [1 1; 2 3; 4 5];
s = sub2ind(size(x), y(:,1), y(:,2)); % from the linear index we get access order
[ySorted yOrder] = sort(s);
% find the first, second index in the right access order:
y1 = y(yOrder, 1);
y2 = y(yOrder, 2);
% access the array using conventional indexing:
z = arrayfun(@(a,b)x(a,b), y1, y2);
% now put things back in the right order:
[rev revOrder] = sort(yOrder);
z = z(revOrder);
I benchmarked this using a 10000x10000 matrix x
and a 5000x2 random element lookup vector y
. Comparing against Jacob's code, I obtained
my method: 51 ms
his method: 225 ms
Increasing the size of the lookup vector to 50000x2, the values are
my method: 523 ms
his method: 305 ms
In other words - which method will work better depends on the number of elements you want to access. Note also that the use of the logical L
matrix implicitly results in sequential access of the large x
matrix - but that during the creation of that matrix you are randomly accessing the memory...
Note by the way that one question you had was "is there a one liner" - and the answer is "yes". If you have your arrays x
and y
as defined, then
z = arrayfun(@(a,b)x(a,b),y(:,1),y(:,2));
is indeed just one line, and doesn't use linear indexing...
精彩评论