How should natural sort work if there're several number sequences identifiable in the string set?

2023-02-26 15:27 问答作者：

So-called natural sort is meant to address the following problem: when users expect

file1.txt
file2.txt
file3.txt
file10.txt
file11.txt

"usual" sort instead produces:

file1.txt
file10.txt
file11.txt
file2.txt
file3.txt

which is inconvenient and isn't "natural".

Now we recently faced a situation when users complained about this very same problem and we considered employing natural sort. However the following problem arised. Consider the following set of strings:

file1file100.txt
file2file99.txt
...
file99file2.txt
file100file1.txt

in which there's more than one identifiable number sequence and those sequences are in opposite to each other. How should natural sort deal with such sets (I mean what should the result be, not how to 开发者_如何学Pythonimplement that)?

The one that comes first wins, surely.

Usual sort lexicographically sorts filenames as sequences of characters (well, perhaps with special treatment of file extensions, although that might be implemented just by ordering . first among characters) : 'f', 'i', 'l', 'e', '1', 'f', 'i', 'l', 'e', '1', '0', '0'.

Natural sort lexicographically sorts filenames as sequences of tokens, where each token is either a character or a number: 'f', 'i', 'l', 'e', 1, 'f', 'i', 'l', 'e', 100. Comparison between characters is normal character order, comparison between numbers is normal integer order, and comparison between a character and a number places numbers before any character (except .). Finally you need to break the tie between file1 and file01, so the "numbers" aren't quite just numbers, they do need to "know" their original representation in case it gets that far.

I'd actually sort of advise against asking the users. If they have a really strong opinion how they want their files sorted then OK, fair enough. Otherwise they might not actually know exactly what they "should" expect, so it makes more sense for an analyst/programmer to figure out what's "normal" than for a user to do so. Of course you can "ask" them indirectly via usability testing, if it's a big enough deal to be worth it. I find that if you ask users the wrong questions, they feel pressured to guess answers, and there's no point coding something arbitrary just because it's what the user representative thought of on the spot.

Whatever users think the rules should be, chances are what they'll actually get on with best is whatever their OS does by default when listing files in its file manager, file dialogs, and that sort of thing. So I'd offer them that (or perhaps the closest to that I can code without wasting a lot of their money on minor edge cases), and if they're still not happy find out why.

I doubt there's a "correct" answer.

To me personally, the "natural" thing to do is to sort by the first embedded number, breaking ties using the second etc.

However, since it's your users' expectations and not mine that matter, it might be worth asking them.

I would expect a strictly left to right based order with the numbers sorted as if they were prepended with a sufficient prefix of 0's. I would try to argue against/convince users who think otherwise by emphasizing the simplicity/generality of the rule.

With your examples, I would think it natural to think of those file names as a sequence of:

<non numeric chars> <numeric chars> <non numeric chars2> <numeric chars2> "." <extension of chars>

Split each file into those 6 sections and sort the files on all fields, with the leftmost field being most significant.

Note: unlike Steve Jessops' good answer, you should consider sequences of either non-numeric or numeric chars as a whole when sorting.

It seems most natural that the result should be as you show it - with the leftmost numeric field giving the overall order - after-all we are used to the left-most digit in numerals being the most significant; and the leftmost number in software releases being the most significant.

继续阅读：algorithm language-agnostic natural-sort sorting

How should natural sort work if there're several number sequences identifiable in the string set?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？