开发者

ISO latin 1 byte to char

If i have a byte b encoded as ISO Latin 1 (ISO 8859-1) is it enough to do char output = (char)b; This seems to work but I don't know if there 开发者_如何学编程is another way.


A direct cast seems to work for this particular encoding. However, best practice would be to use the Encoding.GetChars method for proper conversion.

private static readonly Encoding Iso88591 = Encoding.GetEncoding("ISO8859-1");

public static void Main() {
    var bytes = new Byte[] { 65 };
    var chars = Iso88591.GetChars(bytes);
}


Yes, this should work fine. If you look at the unicode chart for 8859-1 there is a one-to-one mapping between 8859-1 and unicode. That means you can just cast it to char.

However this is not the case with all codepages so a more robust solution might be a good idea.


You can use the Encoding class - in particular the built in Encoding.ASCII to get chars from byte arrays.

In particular, one of the GetChars overloads.


I would use BitConverter's ToChar. Remember that, for one, a char in .NET is a 2-byte value by default - simple casting like that (even if it works, which it might) is not really the best idea.


If the value of the byte is < 128, you're fine. If it's >=128, just casting probably won't get you the right character.

The ISO codepages are basically all ASCII, with the key difference being replacing the upper half of the codepage values (which IIRC on the base ASCII page are mostly line-art characters useful in console apps) with characters useful to the language of the codepage.

HOWEVER, a quick look at the Unicode codepage says that the Latin-1 supplement occupies the 80-FF values (128-255). So IN THIS PARTICULAR INSTANCE, you're probably fine, but if something comes in with, for example, the Cyrillic ISO codepage, you'll have to explicitly transform to Unicode characters.


You can use Encoding.Convert.

        byte[] latin1 = new byte[]{}; // Your data goes here, obviously
        byte[] converted = Encoding.Convert(Encoding.GetEncoding("latin1"), Encoding.ASCII, latin1);

You can then work with the new byte array without worrying about whether Latin 1 will cause you problems.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜