Fast MATLAB method to change column order with sortrows

2023-02-07 13:57 问答作者：

I have data (numeric M x N, n > 2) that arrives sorted by the first column and then by the second. Does anyone know of an efficient algorithm that converts the data to being sorted by the second column and then the first? Obviously, sortrows(data,[2,1]) does the trick but I am looking for something that exploits the 开发者_C百科existing structure of the input data for greater speed as M is very large.

Additionally, the data in the first two columns is a known set of integers (each much smaller than M).

Based on the help documentation for MATLAB R2010b, the function SORTROWS uses a stable version of quicksort. Since stable sorting algorithms "maintain the relative order of records with equal keys", you can achieve what you want by simply resorting the already sorted data with respect to the second column:

data = sortrows(data,2);

This result will maintain the relative order of elements in the first column, such that the data will be sorted first by the second column and then by the first column.

Since the data in the first column is already sorted, you don't need to sort on it again. It will be slightly faster if you do:

>> d = rand(10000,2);  d = round(d*100);  d = sortrows(d,1);
>> tic; a1 = sortrows(d, 2); toc;
Elapsed time is 0.006805 seconds.

Versus:

>> tic; a2 = sortrows(d, [2 1]); toc;
Elapsed time is 0.010207 seconds.
>> isequal(a1, a2)

ans =

     1

I kept banging away at this, but couldn't get it faster than the sortrows method. This exploits the fact that each pair of keys is unique, which I didn't mention above.

% This gives us unique rows of integers between one and 10000, sorted first
% by column 1 then 2.
x = unique(uint32(ceil(10000*rand(1e6,2))),'rows');

tic;
idx = zeros(size(x,1),1);
% Work out where each group of the second keys will start in the sorted output.
StartingPoints = cumsum([1;accumarray(x(:,2),1)]);
% Work out where each group of the first keys is in the input.
Ends = find([~all(diff(x(:,1),1,1)==0,2);true(1,1)]);
Starts = [1;Ends(1:(end-1))+1];
% Build the index.
for i = 1:size(Starts)
    temp = x(Starts(i):Ends(i),2);
    idx(StartingPoints(temp)) = Starts(i):Ends(i);
    StartingPoints(temp) = StartingPoints(temp) + 1;
end
% Apply the index.
y = x(idx,:);
toc

tic;
z = sortrows(x,2);
toc

isequal(y,z)

Gives 0.21 seconds for my algorithm and 0.18 for the second (stable across different random seeds).

If anyone sees any further speed up (other than mex) please feel free to add.

继续阅读：sorting

Fast MATLAB method to change column order with sortrows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？