Numeric value of digit characters in C
I have just started reading through The C Programming Language and I am having trouble understanding one part. Here is an excerpt from page 24:
#include<stdio.h> /*countdigits,whitespace,others*/ main() { intc,i,nwhite,nother; intndigit[10]; nwhite=nother=0; for(i=0;i<10;++i) ndigit[i]=0; while((c=getchar())!=EOF) if(c>='0'&&c<='9') ++ndigit[c-'0']; //THIS IS THE LINE I AM WONDERING ABOUT else if(c==''||c=='\n'||c=='\t') ++nwhite; else ++nother; printf("digits="); for(i=0;i<10;++i) printf("%d",ndigit[i]); printf(",whitespace=%d,other=%d\n", nwhite,nother); }
The output of this program run on itself is
digits=9300000001,whitespace=123,other=345
The declaration
intndigit[10];
declares ndigit to be an array of 10 integers. Array subscripts always start at zero in C, so the elements are
ndigit[0], ndigit[ 1], ..., ndigit[9]
This is reflected in the for loops that initialize and print the array. A subscript can be any integer expression, which includes integer variables like i,and integer constants. This particular program relies on the properties of the character representation of the digits. For example, the test
if(c>='0'&&c<='9')
determines whether the character in c is a digit. If it is, the numeric value of that digit is
c-'0'`
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets. By definition, chars are just small integers, so char variables and constants are identical to ints in arithmetic expressions. This is natural and convenient; for example
c-'0'
is an integer expression with a value between 0 and 9 corresponding to the character '0' to '9' stored in c, and thus a valid subscript for the array ndigit.
The part I am having trouble understanding is why the -'0'
part is necessary in the expression c-'0'
. If a c开发者_如何学Pythonharacter is a small integer as the author says, and the digit characters correspond to their numeric values, then what is -'0'
doing?
Digit characters don't correspond to their numeric values. They correspond to their encoding values (in this case, ASCII).
IIRC, ascii '0' is the value 48. And, luckily for this example and most character sets, the values of '0' through '9' are stored in order in the character set.
So, subtracting the ASCII value for '0' from any ASCII digit returns its "true" value of 0-9.
The numeric value of a character is (on most systems) its ASCII value. The ASCII value of '0' is 48, '1' is 49, etc.
By subtracting 48 from the value of the character '0' becomes 0, '1' becomes 1, etc. By writing it as c - '0'
you don't actually need to know what the ASCII value of '0' is (or that the system is using ASCII - it could be using EBCDIC). The only thing that matters is that the values are consecutive increasing integers.
It converts from the ASCII code of the '0' key on your keyboard to the value zero.
if you did int x = '0' + '0' the result would not be zero.
In most character encodings, all of the digits are placed consecutively in the character set. In ASCII for example, they start with '0'
at 0x30
('1'
is 0x31
, '2'
is 0x32
, etc.). If you want the numeric value of a given digit, you can just subtract '0'
from it and get the right value. The advantage of using '0'
instead of the specific value is that your code can be portable to other character sets with much less effort.
If you access a character string by their characters you'll get the ASCII values back, even if the characters happen to be numbers.
Fortunately the guys who designed that character table made sure that the characters for 0 to 9 are sequential, so you can simply convert from ASCII to a number by subtracting the ASCII-value of '0'.
That's what the code does. I have to admit that it is confusing when you see it the first time, but it's not rocket science.
The ASCII-character value of '0' is 48, '1' is 49, '2' is 50 and so on.
For reference here is a nice ASCII-chart:
http://www.sciencelobby.com/ascii-table/images/ascii-table1.gif
精彩评论