How should natural sort work if there're several number sequences identifiable in the string set?
So-called natural sort is meant to address the following problem: when users expect
file1.txt
file2.txt
file3.txt
file10.txt
file11.txt
"usual" sort instead produces:
file1.txt
file10.txt
file11.txt
file2.txt
file3.txt
which is inconvenient and isn't "natural".
Now we recently faced a situation when users complained about this very same problem and we considered employing natural sort. However the following problem arised. Consider the following set of strings:
file1file100.txt
file2file99.txt
...
file99file2.txt
file100file1.txt
in which there's more than one identifiable number sequence and those sequences are in opposite to each other. How should natural sort deal with such sets (I mean what should the result be, not how to 开发者_如何学Pythonimplement that)?
The one that comes first wins, surely.
Usual sort lexicographically sorts filenames as sequences of characters (well, perhaps with special treatment of file extensions, although that might be implemented just by ordering .
first among characters) : 'f', 'i', 'l', 'e', '1', 'f', 'i', 'l', 'e', '1', '0', '0'
.
Natural sort lexicographically sorts filenames as sequences of tokens, where each token is either a character or a number: 'f', 'i', 'l', 'e', 1, 'f', 'i', 'l', 'e', 100
. Comparison between characters is normal character order, comparison between numbers is normal integer order, and comparison between a character and a number places numbers before any character (except .
). Finally you need to break the tie between file1
and file01
, so the "numbers" aren't quite just numbers, they do need to "know" their original representation in case it gets that far.
I'd actually sort of advise against asking the users. If they have a really strong opinion how they want their files sorted then OK, fair enough. Otherwise they might not actually know exactly what they "should" expect, so it makes more sense for an analyst/programmer to figure out what's "normal" than for a user to do so. Of course you can "ask" them indirectly via usability testing, if it's a big enough deal to be worth it. I find that if you ask users the wrong questions, they feel pressured to guess answers, and there's no point coding something arbitrary just because it's what the user representative thought of on the spot.
Whatever users think the rules should be, chances are what they'll actually get on with best is whatever their OS does by default when listing files in its file manager, file dialogs, and that sort of thing. So I'd offer them that (or perhaps the closest to that I can code without wasting a lot of their money on minor edge cases), and if they're still not happy find out why.
I doubt there's a "correct" answer.
To me personally, the "natural" thing to do is to sort by the first embedded number, breaking ties using the second etc.
However, since it's your users' expectations and not mine that matter, it might be worth asking them.
I would expect a strictly left to right based order with the numbers sorted as if they were prepended with a sufficient prefix of 0's. I would try to argue against/convince users who think otherwise by emphasizing the simplicity/generality of the rule.
With your examples, I would think it natural to think of those file names as a sequence of:
<non numeric chars> <numeric chars> <non numeric chars2> <numeric chars2> "." <extension of chars>
Split each file into those 6 sections and sort the files on all fields, with the leftmost field being most significant.
Note: unlike Steve Jessops' good answer, you should consider sequences of either non-numeric or numeric chars as a whole when sorting.
It seems most natural that the result should be as you show it - with the leftmost numeric field giving the overall order - after-all we are used to the left-most digit in numerals being the most significant; and the leftmost number in software releases being the most significant.
精彩评论