开发者

What roles do minus signs and single quotation marks play in if statements?

In the following code, adding the same letter to both operands of the comparison changes the result. Despite - being not greater than j, -k is greater than jk.

This only happens if one of the operands is the minus sign (-) or single quotation mark (').

Why does this happen? What are the rules?

if - gtr j (echo - greater than j) else echo - less than j
if "-" gtr "j" (echo "-" greater than "j") else echo "-" less than "j"
echo.
if -k gtr jk (echo -k greater than jk) else echo -k less than jk
if "-k" gtr "jk" (echo "-k" greater than "jk") else echo "-k" less than "jk"
echo.
if ' gtr u (echo ' greater than u) else echo ' less than u
if "'" gtr "u" (echo "'" greater than "u") else echo "'" less than "u"
echo.
if 'v gt开发者_运维百科r uv (echo 'v greater than uv) else echo 'v less than uv
if "'v" gtr "uv" (echo "'v" greater than "uv") else echo "'v" less than "uv"

The result is:

- less than j
"-" less than "j"

-k greater than jk
"-k" greater than "jk"

' less than u
"'" less than "u"

'v greater than uv
"'v" greater than "uv"


You may be assuming that strings are just compared character by character, taking their ordinal values.

That's not true. Collation is much more complex than that.

In fact, you can see the same in other environments, such as Windows PowerShell:

PS Home:\> '-' -gt 'j'
False
PS Home:\> '-k' -gt 'jk'
True
PS Home:\> '''' -gt 'u'
False
PS Home:\> '''v' -gt 'uv'
True

It could very well be that the order of strings varies with your locale as well.

As for your particular problem here, quoting from the Unicode Collation Algorithm (UTS #10):

Collation order is not preserved under concatenation or substring operations, in general.

For example, the fact that x is less than y does not mean that x + z is less than y + z, because characters may form contractions across the substring or concatenation boundaries. In summary:

x < y does not imply that xz < yz
x < y does not imply that zx < zy
xz < yz does not imply that x < y
zx < zy does not imply that x < y

and to solve the misconveption you're likely under:

Collation is not code point (binary) order.

A simple example of this is the fact that capital Z comes before lowercase a in the code charts. As noted earlier, beginners may complain that a particular Unicode character is “not in the right place in the code chart.” That is a misunderstanding of the role of the character encoding in collation. While the Unicode Standard does not gratuitously place characters such that the binary ordering is odd, the only way to get the linguistically-correct order is to use a language-sensitive collation, not a binary ordering.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜