Who determines the ordering of characters
I have a query based on the below program -
char ch;
ch = 'z';
while(ch >= 'a')
{
printf("char is %c and the value is %d\n", ch, ch);
ch = ch-1;
}
Why is the printing of whole set of lowercase letters not guaranteed in the above program. If C doesn't make many guarantees about the ordering of characters in internal开发者_StackOverflow form, then who actually does it and how ?
The compiler implementor chooses their underlying character set. About the only thing the standard has to say is that a certain minimal number of characters must be available and that the numeric characters are contiguous.
The required characters for a C99 execution environment are A
through Z
, a
through z
, 0
through 9
(which must be together and in order), any of !"#%&'()*+,-./:;<=>?[\]^_{|}~
, space, horizontal tab, vertical tab, form-feed, alert, backspace, carriage return and new line. This remains unchanged in the current draft of C1x, the next iteration of that standard.
Everything else depends on the implementation.
For example, code like:
int isUpperAlpha(char c) {
return (c >= 'A') && (c <= 'Z');
}
will break on the mainframe which uses EBCDIC, dividing the upper case characters into two regions.
Truly portable code will take that into account. All other code should document its dependencies.
A more portable implementation of your example would be something along the lines of:
static char chrs[] = "zyxwvutsrqponmlkjihgfedcba";
char *pCh = chrs;
while (*pCh != 0) {
printf ("char is %c and the value is %d\n", *pCh, *pCh);
pCh++;
}
If you want a real portable solution, you should probably use islower()
since code that checks only the Latin characters won't be portable to (for example) Greek using Unicode for its underlying character set.
Why is the printing of whole set of lowercase letters not guaranteed in the above program.
Because it's possible to use C with an EBCDIC character encoding, in which the letters aren't consecutive.
Obviously determined by the implementation of C you're using, but more then likely for you it's determined by the American Standard Code for Information Interchange (ASCII).
It is determined by whatever the execution character set is.
In most cases nowadays, that is the ASCII character set, but C has no requirement that a specific character set be used.
Note that there are some guarantees about the ordering of characters in the execution character set. For example, the digits '0' through '9' are guaranteed each to have a value one greater than the value of the previous digit.
These days, people going around calling your code non-portable are engaging in useless pedantry. Support for ASCII-incompatible encodings only remains in the C standard because of legacy EBCDIC mainframes that refuse to die. You will never encounter an ASCII-incompatible char encoding on any modern computer, now or in the future. Give it a few decades, and you'll never encounter anything but UTF-8.
To answer your question about who decides the character encoding: While it's nominally at the discression of your implementation (the C compiler, library, and OS) it was ultimately decided by the internet, both existing practice and IETF standards. Presumably modern systems are intended to communicate and interoperate with one another, and it would be a huge headache to have to convert every protocol header, html file, javascript source, username, etc. back and forth between ASCII-compatible encodings and EBCDIC or some other local mess.
In recent times, it's become clear that a universal encoding not just for machine-parsed text but also for natural-language text is also highly desirable. (Natural language text interchange is not as fundamental as machine-parsed text, but still very common and important.) Unicode provided the character set, and as the only ASCII-compatible Unicode encoding, UTF-8 is pretty much the successor to ASCII as the universal character encoding.
精彩评论