Fastest way to sort files

2022-12-22 21:42 问答作者：

I have a huge text file with lines like:

-568.563626  159   33  -1109.660591  -1231.295129  4.381508
-541.181308  159   28  -1019.279615  -1059.115975  4.632301
-535.370812  155   29  -1033.071786  -1152.907805  4.420473
-533.547101  157   28  -1046.218277  -1063.389677  4.423696

What I want is to sort the file, depending on the 5th column, so I would get

-568.563626  159   33  -1109.660591  -1231.295129  4.381508
-535.370812  155   29  -1033.071786  -1152.907805  4.420473
-533.547101  157   28  -1046.218277  -1063.389677  4.423696
开发者_运维问答-541.181308  159   28  -1019.279615  -1059.115975  4.632301

For this I use:

for i in file.txt ; do sort -k5n $i ; done

I wonder if this is the fastest or more efficient way

Thanks

Why use for? Why not just:

sort -k5n file.txt

And what sort is more efficient depends on a number of issues. You could no doubt make a faster sort for specific data sets (size and other properties)- bubble sort can actually outperform other sorts (with particular inputs).

However, have you tested the standard sort and established that it's too slow? That's the first thing you should do. My machine (which is by no means the gruntiest on the planet) can do 4 million of those lines in under ten seconds:

real     0m9.023s
user     0m8.689s
sys      0m0.332s

Having said that, there is at least one trick which may speed it up. Transform the file into fixed-length records with fixed length fields before applying a sort to it. Sorting on a specific set of characters and fixed length records can often be much faster than the more flexible sorting allowed by variable field and record sizes allowed by sort.

That way, you add an O(n) operation (the transformation) to speed up what is probably at best an O(n log n) operation (the sort).

But, as with all optimisations, measure, don't guess!

if you have many different files to sort, you may use a loop, however, since you have only 1 file, just pass the filename to sort

$ sort -k5n file

继续阅读：bash sorting

Fastest way to sort files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？