Fast MATLAB method to change column order with sortrows
I have data (numeric M x N, n > 2) that arrives sorted by the first column and then by the second. Does anyone know of an efficient algorithm that converts the data to being sorted by the second column and then the first? Obviously, sortrows(data,[2,1]) does the trick but I am looking for something that exploits the 开发者_C百科existing structure of the input data for greater speed as M is very large.
Additionally, the data in the first two columns is a known set of integers (each much smaller than M).
Based on the help documentation for MATLAB R2010b, the function SORTROWS uses a stable version of quicksort. Since stable sorting algorithms "maintain the relative order of records with equal keys", you can achieve what you want by simply resorting the already sorted data with respect to the second column:
data = sortrows(data,2);
This result will maintain the relative order of elements in the first column, such that the data will be sorted first by the second column and then by the first column.
Since the data in the first column is already sorted, you don't need to sort on it again. It will be slightly faster if you do:
>> d = rand(10000,2); d = round(d*100); d = sortrows(d,1);
>> tic; a1 = sortrows(d, 2); toc;
Elapsed time is 0.006805 seconds.
Versus:
>> tic; a2 = sortrows(d, [2 1]); toc;
Elapsed time is 0.010207 seconds.
>> isequal(a1, a2)
ans =
1
I kept banging away at this, but couldn't get it faster than the sortrows method. This exploits the fact that each pair of keys is unique, which I didn't mention above.
% This gives us unique rows of integers between one and 10000, sorted first
% by column 1 then 2.
x = unique(uint32(ceil(10000*rand(1e6,2))),'rows');
tic;
idx = zeros(size(x,1),1);
% Work out where each group of the second keys will start in the sorted output.
StartingPoints = cumsum([1;accumarray(x(:,2),1)]);
% Work out where each group of the first keys is in the input.
Ends = find([~all(diff(x(:,1),1,1)==0,2);true(1,1)]);
Starts = [1;Ends(1:(end-1))+1];
% Build the index.
for i = 1:size(Starts)
temp = x(Starts(i):Ends(i),2);
idx(StartingPoints(temp)) = Starts(i):Ends(i);
StartingPoints(temp) = StartingPoints(temp) + 1;
end
% Apply the index.
y = x(idx,:);
toc
tic;
z = sortrows(x,2);
toc
isequal(y,z)
Gives 0.21 seconds for my algorithm and 0.18 for the second (stable across different random seeds).
If anyone sees any further speed up (other than mex) please feel free to add.
精彩评论