What's the deal with char.GetNumericValue?
I was working on Project Euler 40, and was a bit bothered that there was no int.Parse(char)
. Not a big deal, but I did some asking around and someone suggested char.GetNumericValue
. GetNumericValue
seems like a very odd method to me:
- Takes in a char as a parameter and returns... a double?
- Returns -1.0 if the char is not '0' through '9'
So what's the reasoning behind this method, and what purpose does returning a double serve? I even fired up Reflector and looked at InternalGetNumericValue
, but it's just like watching Lost: every answer开发者_JAVA技巧 just leads to another question.
Remember that it's taking a Unicode character and returning a value. '0' through '9' are the standard decimal digits, however there are other Unicode characters that represent numbers, some of which are floating point.
Like this character: ¼
Console.WriteLine( char.GetNumericValue( '¼' ) );
Outputs 0.25 in the console window.
Here is a comprehensive list of actual numeric values that are returned:
0 - 0 1 - 1 2 - 2 3 - 3 4 - 4 5 - 5 6 - 6 7 - 7 8 - 8 9 - 9 ² - 2 ³ - 3 ¹ - 1 ¼ - 0.25 ½ - 0.5 ¾ - 0.75 ٠ - 0 ١ - 1 ٢ - 2 ٣ - 3 ٤ - 4 ٥ - 5 ٦ - 6 ٧ - 7 ٨ - 8 ٩ - 9 ۰ - 0 ۱ - 1 ۲ - 2 ۳ - 3 ۴ - 4 ۵ - 5 ۶ - 6 ۷ - 7 ۸ - 8 ۹ - 9 ߀ - 0 ߁ - 1 ߂ - 2 ߃ - 3 ߄ - 4 ߅ - 5 ߆ - 6 ߇ - 7 ߈ - 8 ߉ - 9 ० - 0 १ - 1 २ - 2 ३ - 3 ४ - 4 ५ - 5 ६ - 6 ७ - 7 ८ - 8 ९ - 9 ০ - 0 ১ - 1 ২ - 2 ৩ - 3 ৪ - 4 ৫ - 5 ৬ - 6 ৭ - 7 ৮ - 8 ৯ - 9 ৴ - 1 ৵ - 2 ৶ - 3 ৷ - 4 ৹ - 16 ੦ - 0 ੧ - 1 ੨ - 2 ੩ - 3 ੪ - 4 ੫ - 5 ੬ - 6 ੭ - 7 ੮ - 8 ੯ - 9 ૦ - 0 ૧ - 1 ૨ - 2 ૩ - 3 ૪ - 4 ૫ - 5 ૬ - 6 ૭ - 7 ૮ - 8 ૯ - 9 ୦ - 0 ୧ - 1 ୨ - 2 ୩ - 3 ୪ - 4 ୫ - 5 ୬ - 6 ୭ - 7 ୮ - 8 ୯ - 9 ௦ - 0 ௧ - 1 ௨ - 2 ௩ - 3 ௪ - 4 ௫ - 5 ௬ - 6 ௭ - 7 ௮ - 8 ௯ - 9 ௰ - 10 ௱ - 100 ௲ - 1000 ౦ - 0 ౧ - 1 ౨ - 2 ౩ - 3 ౪ - 4 ౫ - 5 ౬ - 6 ౭ - 7 ౮ - 8 ౯ - 9 ೦ - 0 ೧ - 1 ೨ - 2 ೩ - 3 ೪ - 4 ೫ - 5 ೬ - 6 ೭ - 7 ೮ - 8 ೯ - 9 ൦ - 0 ൧ - 1 ൨ - 2 ൩ - 3 ൪ - 4 ൫ - 5 ൬ - 6 ൭ - 7 ൮ - 8 ൯ - 9 ๐ - 0 ๑ - 1 ๒ - 2 ๓ - 3 ๔ - 4 ๕ - 5 ๖ - 6 ๗ - 7 ๘ - 8 ๙ - 9 ໐ - 0 ໑ - 1 ໒ - 2 ໓ - 3 ໔ - 4 ໕ - 5 ໖ - 6 ໗ - 7 ໘ - 8 ໙ - 9 ༠ - 0 ༡ - 1 ༢ - 2 ༣ - 3 ༤ - 4 ༥ - 5 ༦ - 6 ༧ - 7 ༨ - 8 ༩ - 9 ༪ - 0.5 ༫ - 1.5 ༬ - 2.5 ༭ - 3.5 ༮ - 4.5 ༯ - 5.5 ༰ - 6.5 ༱ - 7.5 ༲ - 8.5 ༳ - -0.5 ၀ - 0 ၁ - 1 ၂ - 2 ၃ - 3 ၄ - 4 ၅ - 5 ၆ - 6 ၇ - 7 ၈ - 8 ၉ - 9 ፩ - 1 ፪ - 2 ፫ - 3 ፬ - 4 ፭ - 5 ፮ - 6 ፯ - 7 ፰ - 8 ፱ - 9 ፲ - 10 ፳ - 20 ፴ - 30 ፵ - 40 ፶ - 50 ፷ - 60 ፸ - 70 ፹ - 80 ፺ - 90 ፻ - 100 ፼ - 10000 ᛮ - 17 ᛯ - 18 ᛰ - 19 ០ - 0 ១ - 1 ២ - 2 ៣ - 3 ៤ - 4 ៥ - 5 ៦ - 6 ៧ - 7 ៨ - 8 ៩ - 9 ៰ - 0 ៱ - 1 ៲ - 2 ៳ - 3 ៴ - 4 ៵ - 5 ៶ - 6 ៷ - 7 ៸ - 8 ៹ - 9 ᠐ - 0 ᠑ - 1 ᠒ - 2 ᠓ - 3 ᠔ - 4 ᠕ - 5 ᠖ - 6 ᠗ - 7 ᠘ - 8 ᠙ - 9 ᥆ - 0 ᥇ - 1 ᥈ - 2 ᥉ - 3 ᥊ - 4 ᥋ - 5 ᥌ - 6 ᥍ - 7 ᥎ - 8 ᥏ - 9 ᧐ - 0 ᧑ - 1 ᧒ - 2 ᧓ - 3 ᧔ - 4 ᧕ - 5 ᧖ - 6 ᧗ - 7 ᧘ - 8 ᧙ - 9 ᭐ - 0 ᭑ - 1 ᭒ - 2 ᭓ - 3 ᭔ - 4 ᭕ - 5 ᭖ - 6 ᭗ - 7 ᭘ - 8 ᭙ - 9 ⁰ - 0 ⁴ - 4 ⁵ - 5 ⁶ - 6 ⁷ - 7 ⁸ - 8 ⁹ - 9 ₀ - 0 ₁ - 1 ₂ - 2 ₃ - 3 ₄ - 4 ₅ - 5 ₆ - 6 ₇ - 7 ₈ - 8 ₉ - 9 ⅓ - 0.333333333333333 ⅔ - 0.666666666666667 ⅕ - 0.2 ⅖ - 0.4 ⅗ - 0.6 ⅘ - 0.8 ⅙ - 0.166666666666667 ⅚ - 0.833333333333333 ⅛ - 0.125 ⅜ - 0.375 ⅝ - 0.625 ⅞ - 0.875 ⅟ - 1 Ⅰ - 1 Ⅱ - 2 Ⅲ - 3 Ⅳ - 4 Ⅴ - 5 Ⅵ - 6 Ⅶ - 7 Ⅷ - 8 Ⅸ - 9 Ⅹ - 10 Ⅺ - 11 Ⅻ - 12 Ⅼ - 50 Ⅽ - 100 Ⅾ - 500 Ⅿ - 1000 ⅰ - 1 ⅱ - 2 ⅲ - 3 ⅳ - 4 ⅴ - 5 ⅵ - 6 ⅶ - 7 ⅷ - 8 ⅸ - 9 ⅹ - 10 ⅺ - 11 ⅻ - 12 ⅼ - 50 ⅽ - 100 ⅾ - 500 ⅿ - 1000 ↀ - 1000 ↁ - 5000 ↂ - 10000 ① - 1 ② - 2 ③ - 3 ④ - 4 ⑤ - 5 ⑥ - 6 ⑦ - 7 ⑧ - 8 ⑨ - 9 ⑩ - 10 ⑪ - 11 ⑫ - 12 ⑬ - 13 ⑭ - 14 ⑮ - 15 ⑯ - 16 ⑰ - 17 ⑱ - 18 ⑲ - 19 ⑳ - 20 ⑴ - 1 ⑵ - 2 ⑶ - 3 ⑷ - 4 ⑸ - 5 ⑹ - 6 ⑺ - 7 ⑻ - 8 ⑼ - 9 ⑽ - 10 ⑾ - 11 ⑿ - 12 ⒀ - 13 ⒁ - 14 ⒂ - 15 ⒃ - 16 ⒄ - 17 ⒅ - 18 ⒆ - 19 ⒇ - 20 ⒈ - 1 ⒉ - 2 ⒊ - 3 ⒋ - 4 ⒌ - 5 ⒍ - 6 ⒎ - 7 ⒏ - 8 ⒐ - 9 ⒑ - 10 ⒒ - 11 ⒓ - 12 ⒔ - 13 ⒕ - 14 ⒖ - 15 ⒗ - 16 ⒘ - 17 ⒙ - 18 ⒚ - 19 ⒛ - 20 ⓪ - 0 ⓫ - 11 ⓬ - 12 ⓭ - 13 ⓮ - 14 ⓯ - 15 ⓰ - 16 ⓱ - 17 ⓲ - 18 ⓳ - 19 ⓴ - 20 ⓵ - 1 ⓶ - 2 ⓷ - 3 ⓸ - 4 ⓹ - 5 ⓺ - 6 ⓻ - 7 ⓼ - 8 ⓽ - 9 ⓾ - 10 ⓿ - 0 ❶ - 1 ❷ - 2 ❸ - 3 ❹ - 4 ❺ - 5 ❻ - 6 ❼ - 7 ❽ - 8 ❾ - 9 ❿ - 10 ➀ - 1 ➁ - 2 ➂ - 3 ➃ - 4 ➄ - 5 ➅ - 6 ➆ - 7 ➇ - 8 ➈ - 9 ➉ - 10 ➊ - 1 ➋ - 2 ➌ - 3 ➍ - 4 ➎ - 5 ➏ - 6 ➐ - 7 ➑ - 8 ➒ - 9 ➓ - 10 ⳽ - 0.5 〇 - 0 〡 - 1 〢 - 2 〣 - 3 〤 - 4 〥 - 5 〦 - 6 〧 - 7 〨 - 8 〩 - 9 〸 - 10 〹 - 20 〺 - 30 ㆒ - 1 ㆓ - 2 ㆔ - 3 ㆕ - 4 ㈠ - 1 ㈡ - 2 ㈢ - 3 ㈣ - 4 ㈤ - 5 ㈥ - 6 ㈦ - 7 ㈧ - 8 ㈨ - 9 ㈩ - 10 ㉑ - 21 ㉒ - 22 ㉓ - 23 ㉔ - 24 ㉕ - 25 ㉖ - 26 ㉗ - 27 ㉘ - 28 ㉙ - 29 ㉚ - 30 ㉛ - 31 ㉜ - 32 ㉝ - 33 ㉞ - 34 ㉟ - 35 ㊀ - 1 ㊁ - 2 ㊂ - 3 ㊃ - 4 ㊄ - 5 ㊅ - 6 ㊆ - 7 ㊇ - 8 ㊈ - 9 ㊉ - 10 ㊱ - 36 ㊲ - 37 ㊳ - 38 ㊴ - 39 ㊵ - 40 ㊶ - 41 ㊷ - 42 ㊸ - 43 ㊹ - 44 ㊺ - 45 ㊻ - 46 ㊼ - 47 ㊽ - 48 ㊾ - 49 ㊿ - 50 0 - 0 1 - 1 2 - 2 3 - 3 4 - 4 5 - 5 6 - 6 7 - 7 8 - 8 9 - 9
The Unicode Consortium maintains a list of characters called Unicode Character Database with the main file describing characters is UnicodeData.txt. Each character belongs to a category and has a number of properties. So if a character belongs to category Number, and has the property Numeric_Type then it's a numeric value, and char.GetNumericValue()
will return that value. You can check that value in UnicodeData.txt, if the corresponding code point is numeric there will be many related fields in the line. For example for U+109BC you can see this, which means it represents ¹¹⁄ ₁₂:
109BC;MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS;No;0;R;;;;11/12;N;;;;;
Notice the 11/12
field near the end. A more detailed numeric description can be found in DerivedNumericType.txt. One example line of it:
11FC0..11FD4 ; Numeric # No [21] TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIETH..TAMIL FRACTION DOWNSCALING FACTOR KIIZH
For more detailed information read
- Numeric Property Definitions
- Unicode® Standard Annex #44 - Numeric_Type
So as you can see, the list in John Rasch's answer is far from comprehensive because only numbers in the BMP are listed. So I wrote the following small program to print all numeric code points including the ones outside the BMP
using System;
using System.Globalization;
using System.Threading;
public class Program
{
public static void Main()
{
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
Console.OutputEncoding = System.Text.Encoding.UTF8;
int c = 0;
// Print from i = 1 for all the possible numbers,
// i = 0x10000 for numbers outside the BMP
for (int i = 0x10000; i < 0x10FFFF; i++)
{
try
{
var ch = char.ConvertFromUtf32(i);
var numericValue = char.GetNumericValue(ch, 0);
if (numericValue != -1)
{
Console.Write("|U+{0:X5} ({1}): {2}", i, ch, numericValue);
if (c == 4)
{
Console.WriteLine("|");
c = 0;
}
else
c++;
}
}
catch
{
}
}
}
}
You can change the initial value of the loop to get all numeric values
I'm using .NET Core 3.1 so Unicode 11.0 will be used. If you run it in .NET 5 you'll get Unicode 13.0 output, and later versions may have more digits
Here the program's the output in Unicode 11.0:
精彩评论