开发者

How do I get the decimal value of a unicode character in Java?

I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:开发者_开发百科

UTF-8:

著者名

Decimal:

著者名


I suspect you're just interested in a conversion from char to int, which is implicit:

for (int i = 0; i < text.length(); i++)
{
    char c = text.charAt(i);
    int value = c;
    System.out.println(value);
}

EDIT: If you want to handle surrogate pairs, you can use something like:

for (int i = 0; i < text.length(); i++)
{
    int codePoint = text.codePointAt(i);
    // Skip over the second char in a surrogate pair
    if (codePoint > 0xffff)
    {
        i++;
    }
    System.out.println(codePoint);
}


Ok after reading Jon's post and still musing about surrogates in Java, I decided to be a bit less lazy and google it up. There's actually support for surrogates in the Character class it's just a bit.. roundabout

So here's the code that'll work correctly, assuming valid input:

    for (int i = 0; i < str.length(); i++) {
        char ch = str.charAt(i);
        if (Character.isHighSurrogate(ch)) {
            System.out.println("Codepoint: " + 
                   Character.toCodePoint(ch, str.charAt(i + 1)));
            i++;
        }
        System.out.println("Codepoint: " + (int)ch);
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜