开发者

C# UTF-32 ToLower

I'm looking for a way to convert Unicode UTF-32 (int) to lower case. In Java, something like this, would do the trick:

Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))

I have UTF-32 from Char.ConvertToUtf32, but there doesn't seem to be a way to lower case that value.

开发者_如何学编程UPDATE: I'm dealing with a stream/array of chars, I've found the code points by looking for the hi surrogate, somewhat similar to the Java snipit above. Converting back and forth to String is going to be to inefficient.


The only built-in way to do this is convert the UTF-32 to a String. Something like the following should work:

static Int32 ToLower(Int32 c)
{
    // Convert UTF-32 character to a UTF-16 String.
    var strC = Char.ConvertFromUtf32(c);

    // Casing rules depends on the culture.
    // Consider using ToLowerInvariant().
    var lower = strC.ToLower();

    // Convert the UTF-16 String back to UTF-32 character and return it.
    return Char.ConvertToUtf32(lower, 0);
}

You indicate that this is inefficient for your needs. Have you benchmarked it?

If you still insist on doing casing on UTF-32, then you will need to roll your own. Luckily, the Unicode Consortium has done most of the hard work. Take a look at the Unicode case folding file. Parse this file storing the data in an appropriate structure. Then the casing can be done directly against that with your data in whatever format you prefer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜