Is a character 1 byte or 2 bytes in Java?

2023-03-05 08:52 问答作者：

I thought characters in java are 16 bits as suggested in java doc. Isn't it the case for strings? I have a code that stores an object into a file:

public static void storeNormalObj(File outFile, Object obj) {
    FileOutputStream fos = null;
  开发者_开发百科  ObjectOutputStream oos = null;
    try {
        fos = new FileOutputStream(outFile);
        oos = new ObjectOutputStream(fos);
        oos.writeObject(obj);
        oos.flush();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            oos.close();
            try {
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Basically, I tried to store an string "abcd" in to file "output", when I opened up output with an editor and deleted the none string part, what's left is just the string "abcd", which is 4 bytes in total. Anyone knows why? Does java automatically saves space by using ASCII instead of UNICODE for Strings that can be supported by ASCII? Thanks

(I think by "none string part" you are referring to the bytes that ObjectOutputStream emits when you create it. It is possible you don't want to use ObjectOutputStream, but I don't know your requirements.)

Just FYI, Unicode and UTF-8 are not the same thing. Unicode is a standard that specifies, amongst other things, what characters are available. UTF-8 is a character encoding that specifies how these characters shall be physically encoded in 1s and 0s. UTF-8 can use 1 byte for ASCII (<= 127) and up to 4 bytes to represent other Unicode characters.

UTF-8 is a strict superset of ASCII. So even if you specify a UTF-8 encoding for a file and you write "abcd" to it, it will contain just those four bytes: they have the same physical encoding in ASCII as they do in UTF-8.

Your method uses ObjectOutputStream which actually has a significantly different encoding than either ASCII or UTF-8! If you read the Javadoc carefully, if obj is a string and has already occurred in the stream, subsequent calls to writeObject will cause a reference to the previous string to be emitted, potentially causing many fewer bytes to be written in the case of repeated strings.

If you're serious about understanding this, you really should spend a good amount of time reading about Unicode and character encoding systems. Wikipedia has an excellent article on Unicode as a start.

Yea, the char is only Unicode within the context of the Java runtime environment. If you wish to write it using 16-bit encoding, use a FileWriter.

    FileWriter outputStream = null;

    try {
        outputStream = new FileWriter("myfilename.dat");

        int c;
        while ((c = inputStream.read()) != -1) {
            outputStream.write(c);
        }
    } finally {
        if (outputStream != null) {
            outputStream.close();
        }
    }

If you look at the source of String, it will note that it calls DataOutput.writeUTF to write Strings. And if you read that you'll find out they are written as "modified UTF-8". The details are lengthy, but if you don't use non 7 bit ascii, yes, it will take one byte. If you want the gory details look at the EXTREMELY long javadoc in DataOutput.writeUTF()

You may be interested to know there is a -XX:+UseCompressedStrings option in Java Update 21 performance release and later. This will allows String to use a byte[] for strings which do not need a char[]

Despite the Java Hotspot VM Options guide suggesting it may be on by default, this may only be for performance releases. It only appears to work for me if I turn it on explicitly.

So do you expect a 16*4=64 bits = 8 bytes file? More than UTF-8 or ASCII encoding. Once the file is written to a file. The memory (in terms of space) management is up to the operating system. And your code doesn't have a control on it.

继续阅读：ascii character string unicode

Is a character 1 byte or 2 bytes in Java?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？