开发者

Please help me clarify some concepts with Java IO and maybe fall in love with it!

I'm trying to familiarize myself with the different types of stream IOs Java has to offer, so I wrote this little piece of code here.

public static void main(String[] args) throws IOException {
    String str = "English is being IOed!\nLine 2 has a number.\n中文字體(Chinese)";

    FileOutputStream fos = new FileOutputStream("ByteIO.txt");
    Scanner fis = new Scanner(new FileInputStream("ByteIO.txt"));
    FileWriter fw = new FileWriter("CharIO.txt");
    Scanner fr = new Scanner(new FileReader("CharIO.txt"));

    BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("BufferedByteIO.txt"));
    Scanner bis = new Scanner(new BufferedInputStream(new FileInputStream("BufferedByteIO.txt")));
    BufferedWriter bw = new BufferedWriter(new FileWriter("BufferedCharIO.txt"));
    Scanner br = new Scanner(new BufferedReader(new FileReader("BufferedCharIO.txt")));

    DataOutputStream dos = new DataOutputStream(new BufferedOutputStream((new FileOutputStream("DataBufferedByteIO.txt"))));
    Scanner dis = new Scanner(new DataInputStream(new BufferedInputStream((new FileInputStream("DataBufferedByteIO.txt")))));

    try {
        System.out.printf("ByteIO:\n");
        fos.write(str.getBytes());
        while (fis.hasNext())
            System.out.print(fis.next());// in the form of a String

        System.out.printf("\nCharIO:\n");
        fw.write(str);
        while (fr.hasNext())
            System.out.print(fr.next());

        System.out.printf("\nBufferedByteIO:\n");
        bos.write(str.getBytes());
        bos.flush();// buffer is not full, so you'll need to flush it
        while (bis.hasNext())
            System.out.print(bis.next());

        System.out.printf("\nBufferedCharIO:\n");
        bw.write(str);
        bw.flush();// buffer is not full, so you'll need to flush it
        while (br.hasNext())
            System.out.print(br.next());

        System.out.printf("\nDataBufferedByteIO:\n");
        dos.write(str.getBytes());
        //dos.flush();// dos doesn't seem to need this...
        while (dis.hasNext())
            System.out.print(dis.next());
    } finally {
        fos.close();
        fis.close();
        fw.close();
        fr.close();
        bos.close();
        br.close();
        dos.close();
        dis.close(开发者_Python百科);
    }

}

All it does is just write a pre-defined string into the file and then read it. The problem arises when I run the code, I get this:

ByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
CharIO:
                        //<--Empty line here
BufferedByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
BufferedCharIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
DataBufferedByteIO:
                        //<--Empty line here
  1. The files are all populated with the correct data, so I suppose something is wrong with the scanner, but I just don't know what went wrong, and I hope somebody can point the mistake out for me.

  2. The files are all populated with the same data. That's weird, according to Java I/O Streams, Byte Streams can only process single bytes, and only Character Streams can process Unicode, so shouldn't Byte Streams spit out gibberish when processing Chinese characters, which are UTF-16 (I think)? What exactly is the difference between a Byte Stream and a Character Stream (fos vs fw)?

  3. On a partially unrelated topic, I thought Byte Streams were used to work with binary data such as music and images, I also thought that the data Byte Streams spit out should be illegible, but I seem to be wrong, am I? Exactly which I/O Stream Class(es) should I work with if I'm dealing with binary data?


An important concept to understand here is that of encoding.

String/char[]/Writer/Reader are used to deal with textual data of any kind.

byte[]/OutputStream/InputStream are used to deal with binary data. Also, a file on your disk only every stores binary data (yes, that's true, it will hopefully be a bit more clear in a minute).

Whenever you convert between those two worlds some kind of encoding will be in play. In Java, there are several ways to convert between those worlds without specifying an encoding. In this case, the platform default encoding will be used (which one this is depends on your platform and configuration/locale). [*]

The task of an encoding is to convert some given binary data (usually from a byte[]/ByteBuffer/InputStream) to textual data (usually into char[]/CharBuffer/Writer) or the other way around.

How exactly this happens depends on the encoding used. Some encodings (such as the ISO-8859-* family) are a simple mapping from byte values to corresponding unicode codepoints, others (such as UTF-8) are more complex and a single unicode codepoint can be anything from 1 to 4 bytes.

There's a quite nice article that gives a basic overview over the whole encoding issue titled: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

[*] Using the platform default encoding is usually not desired, because it makes your program un-portable and hard to use, but that's beside the point for this post.


Using BufferedInputStream and DataInputStream does not alter the content of the data.

Byte stream is for reading binary data. It is not suitable here.

Character stream is for reading text, the scanner assumes you are reading new line terminated lines. (Which you don't appear to have)

If I run

String str = "English is being IOed!\nLine 2 has a number.\n\u4E2D\u6587\u5b57\u9ad4(Chinese)\n";
Writer fw = new OutputStreamWriter(new FileOutputStream("ReaderWriter.txt"), "UTF-8");
fw.write(str);
fw.close();
Reader fr = new InputStreamReader(new FileInputStream("ReaderWriter.txt"), "UTF-8");
Scanner scanner = new Scanner(fr);
String next = "";
while (scanner.hasNext()) {
    next = scanner.next();
    System.out.println(next);
}
for (int i = 0; i < next.length(); i++)
    System.out.println(Integer.toHexString((int) next.charAt(i)));
fr.close();

I get

English
is
being
IOed!
Line
2
has
a
number.
????(Chinese)
4e2d
6587
5b57
9ad4
28
43
68
69
6e
65
73
65
29

You can see that the original characters are preserved. The '?' means the character could not be displayed on my terminal or my character encoding. (I don't know why)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜