String to binary and vice versa: extended ASCII
I want to convert a String to binary by putting it in a byte array (String.getBytes[]
) and then store the binary string for each byte (Integer.toBinaryString(bytearray)
) in a String[]. Then I want to convert back to normal String via Byte.parseByte(stringarray[i], 2)
. This works great for standard ASCII-Table, but not for the extended one. For example, an A
gives me 1000001
, but an Ä
returns
11111111111111111111111111000011
11111111111111111111111110000100
Any ideas how to manage this?
public class BinString {
public static void main(String args[]) {
String s = "ä";
System.out.println(binToString(stringToBin(s)));
}
public static String[] stringToBin(String s) {
System.out.println("Converting: " + s);
byte[] b = s.getBytes();
String[开发者_运维技巧] sa = new String[s.getBytes().length];
for (int i = 0; i < b.length; i++) {
sa[i] = Integer.toBinaryString(b[i] & 0xFF);
}
return sa;
}
public static String binToString(String[] strar) {
byte[] bar = new byte[strar.length];
for (int i = 0; i < strar.length; i++) {
bar[i] = Byte.parseByte(strar[i], 2);
System.out.println(Byte.parseByte(strar[i], 2));
}
String s = new String(bar);
return s;
}
}
First off: "extended ASCII" is a very misleading title that's used to refer to a ton of different encodings.
Second: byte
in Java is signed, while bytes in encodings are usually handled as unsigned. Since you use Integer.toBinaryString()
the byte
will be converted to an int
using sign extension (because byte values > 127 will be represented by negative values in Java).
To avoid this simply use & 0xFF
to mask all but the lower 8 bit like this:
String binary = Integer.toBinaryString(byteArray[i] & 0xFF);
To expand on Joachim's point about "extended ASCII" I'd add...
Note that getBytes()
is a transcoding operation that converts data from UTF-16 to the platform default encoding. The encoding varies from system to system and sometimes even between users on the same PC. This means that results are not consistent on all platforms and if a legacy encoding is the default (as it is on Windows) that data can be lost.
To make the operation symmetrical, you need to provide an encoding explicitly (preferably a Unicode encoding such as UTF-8 or UTF-16.)
Charset encoding = Charset.forName("UTF-16");
byte[] b = s1.getBytes(encoding);
String s2 = new String(b, encoding);
assert s1.equals(s2);
精彩评论