Is it a good idea to use unicode symbols as Java identifiers?
I have a snippet of code that looks like this开发者_C百科:
double Δt = lastPollTime - pollTime;
double α = 1 - Math.exp(-Δt / τ);
average += α * (x - average);
Just how bad an idea is it to use unicode characters in Java identifiers? Or is this perfectly acceptable?
It's a bad idea, for various reasons.
Many people's keyboards do not support these characters. If I were to maintain that code on a qwerty keyboard (or any other without Greek letters), I'd have to copy and paste those characters all the time.
Some people's editors or terminals might not display these characters properly. For example, some editors (unfortunately) still default to some ISO-8859 (Latin) variant. The main reason why ASCII is still so prevalent is that it nearly always works.
Even if the characters can be rendered properly, they may cause confusion. Straight from Sun (emphasis mine):
Identifiers that have the same external appearance may yet be different. For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \u0041), LATIN SMALL LETTER A (a, \u0061), GREEK CAPITAL LETTER ALPHA (A, \u0391), CYRILLIC SMALL LETTER A (a, \u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a, \ud835\udc82) are all different.
...
Unicode composite characters are different from the decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á, \u00c1) could be considered to be the same as a LATIN CAPITAL LETTER A (A, \u0041) immediately followed by a NON-SPACING ACUTE (´, \u0301) when sorting, but these are different in identifiers.
This is in no way an imaginary problem: α (U+03b1 GREEK SMALL LETTER ALPHA) and ⍺ (U+237a APL FUNCTIONAL SYMBOL ALPHA) are different characters!
There is no way to tell which characters are valid. The characters from your code work, but when I use the FUNCTIONAL SYMBOL ALPHA my Java compiler complains about "illegal character: \9082". Even though the functional symbol would be more appropriate in this code. There seems to be no solid rule about which characters are acceptable, except asking
Character.isJavaIdentifierPart()
.Even though you may get it to compile, it seems doubtful that all Java virtual machine implementations have been rigorously tested with Unicode identifiers. If these characters are only used for variables in method scope, they should get compiled away, but if they are class members, they will end up in the
.class
file as well, possibly breaking your program on buggy JVM implementations.
looks good as it uses the correct symbols, but how many of your team will know the keystrokes for those symbols?
I would use an english representation just to make it easier to type. And others might not have a character set that supports those symbols set up on their pc.
That code is fine to read, but horrible to maintain - I suggest use plain English identifiers like so:
double deltaTime = lastPollTime - pollTime;
double alpha = 1 - Math.exp(-delta....
It is perfectly acceptable if it is acceptable in your working group. A lot of the answers here operate on the arrogant assumption that everybody programs in English. Non-English programmers are by no means rare these days and they're getting less rare at an accelerating rate. Why should they restrict themselves to English versions when they have a perfectly good language at their disposal?
Anglophone arrogance aside, there are other legitimate reasons for using non-English identifiers. If you're writing mathematics packages, for example, using Greek is fine if your target is fellow mathematicians. Why should people type out "delta" in your workgroup when everybody can understand "Δ" and likely type it more quickly? Almost any problem domain will have its own jargon and sometimes that jargon is expressed in something other than the Latin alphabet. Why on Earth would you want to try and jam everything into ASCII?
It's an excellent idea. Honest. It's just not easily practicable at the time. Let's keep a reference to it for the future. I would love to see triangles, circles, squares, etc... as part of program code. But for now, please do try to re-write it, the way Crozin suggests.
Why not? If the people working on that code can type those easily, it's acceptable.
But god help those who can't display unicode, or who can't type them.
In a perfect world, this would be the recommended way.
Unfortunately you run into character encodings when moving outside of plain 7-bit ASCII characters (UTF-8 is different from ISO-Latin-1 is different from UTF-16 etc), meaning that you eventually will run into problems. This has happened to me when moving from Windows to Linux. Our national scandinavian characters broke in the process, but fortunately was only in strings. We then used the \u encoding for all those.
If you can be absolutely certain that you will never, ever run into such a thing - for instance if your files contain a proper BOM - then by all means, do this. It will make your code more readable. If at least the smallest amount of doubt, then don't.
(Please note that the "use non-English languages" is a different matter. I'm just thinking in using symbols instead of letters).
精彩评论