开发者

String to binary and vice versa: extended ASCII

I want to convert a String to binary by putting it in a byte array (String.getBytes[]) and then store the binary string for each byte (Integer.toBinaryString(bytearray)) in a String[]. Then I want to convert back to normal String via Byte.parseByte(stringarray[i], 2). This works great for standard ASCII-Table, but not for the extended one. For example, an A gives me 1000001, but an Ä returns

11111111111111111111111111000011
11111111111111111111111110000100

Any ideas how to manage this?

public class BinString {
    public static void main(String args[]) {
        String s = "ä";
        System.out.println(binToString(stringToBin(s)));

    }

    public static String[] stringToBin(String s) {
        System.out.println("Converting: " + s);
        byte[] b = s.getBytes();
        String[开发者_运维技巧] sa = new String[s.getBytes().length];
        for (int i = 0; i < b.length; i++) {
            sa[i] = Integer.toBinaryString(b[i] & 0xFF);
        }
        return sa;
    }

    public static String binToString(String[] strar) {
        byte[] bar = new byte[strar.length];
        for (int i = 0; i < strar.length; i++) {
            bar[i] = Byte.parseByte(strar[i], 2);
            System.out.println(Byte.parseByte(strar[i], 2));

        }
        String s = new String(bar);
        return s;
    }

}


First off: "extended ASCII" is a very misleading title that's used to refer to a ton of different encodings.

Second: byte in Java is signed, while bytes in encodings are usually handled as unsigned. Since you use Integer.toBinaryString() the byte will be converted to an int using sign extension (because byte values > 127 will be represented by negative values in Java).

To avoid this simply use & 0xFF to mask all but the lower 8 bit like this:

String binary = Integer.toBinaryString(byteArray[i] & 0xFF);


To expand on Joachim's point about "extended ASCII" I'd add...

Note that getBytes() is a transcoding operation that converts data from UTF-16 to the platform default encoding. The encoding varies from system to system and sometimes even between users on the same PC. This means that results are not consistent on all platforms and if a legacy encoding is the default (as it is on Windows) that data can be lost.

To make the operation symmetrical, you need to provide an encoding explicitly (preferably a Unicode encoding such as UTF-8 or UTF-16.)

Charset encoding = Charset.forName("UTF-16");
byte[] b = s1.getBytes(encoding);
String s2 = new String(b, encoding);
assert s1.equals(s2);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜