inconsistent sort behavior
I have a sample file containg "aA0_- " characters on each one on a single. Sorting it using GNU sort gives the following sort order:
$ cat /tmp/sample | sort
_
-
0
a
A
after appending some other character, we obtain a different order (non-alphanumeric characters seems to have lower priority):
$ cat /t开发者_StackOverflow中文版mp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
while when we insert this character to the beginning, we obtain the original sort order:
$ cat /tmp/sample | sed 's/^/x/' | sort
x
x_
x-
x0
xa
xA
.. what is the explanation of such behavior?
UPDATE
when 'z
' and 'Z
' characters are included in the sample, the result seems yet sranger:
$ cat /tmp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
zx
Zx
.. but in the light of the correct answer, it is so because all '', '
_
' and '-
' are whitespace in the current locale (en_US.UTF-8) and are not ignored in sorting.
Your locale file should contain a definition of LC_COLLATE. This determines the sort order of characters. Also check the definition of LC_CTYPE, and which characters are classified as 'space'.
if '-' and '_' are classified as space, you might find the results you have shown.
精彩评论