How do I get the decimal value of a unicode character in Java?
I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:开发者_开发百科
UTF-8:
著者名
Decimal:
著者名
I suspect you're just interested in a conversion from char
to int
, which is implicit:
for (int i = 0; i < text.length(); i++)
{
char c = text.charAt(i);
int value = c;
System.out.println(value);
}
EDIT: If you want to handle surrogate pairs, you can use something like:
for (int i = 0; i < text.length(); i++)
{
int codePoint = text.codePointAt(i);
// Skip over the second char in a surrogate pair
if (codePoint > 0xffff)
{
i++;
}
System.out.println(codePoint);
}
Ok after reading Jon's post and still musing about surrogates in Java, I decided to be a bit less lazy and google it up. There's actually support for surrogates in the Character class it's just a bit.. roundabout
So here's the code that'll work correctly, assuming valid input:
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (Character.isHighSurrogate(ch)) {
System.out.println("Codepoint: " +
Character.toCodePoint(ch, str.charAt(i + 1)));
i++;
}
System.out.println("Codepoint: " + (int)ch);
}
精彩评论