Java stream misconceptions... some clarification?

2023-03-27 02:38 问答作者：

I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read? For instance, bytes are read in as 8 bit 开发者_StackOverflow社区bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?

The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream and write out using a FileOutputStream, why is this file readable when I open it with a text editor? How is the FileOutputStream treating the bytes?

The key concept here is character encoding: each human readable character is somehow encoded into one or more bytes. There are plenty of character encodings. The most popular ones are:

ASCII (7 bit, remaining bit is unused) that treats one character as one byte
UTF-8: most common characters are represented as a single byte, less common as 2 or even more

These encodings are readable even when you open a file in hex editor. However there many character encodings that do not have this feature, namely UTF-16 and UTF-32.

Now back to your question: InputStream only gives you a stream of bytes. If your bytes represent characters encoded with ASCII or UTF-8, most of the time you are fine. But if these bytes represent something more sophisticated like UTF-16, you absolutely need a Reader. Of course the reader has to know which character encoding does the underlying InputStream provide. This is often a problem done by the beginners - Reader not initialized with character encoding explicitly will often fall back to system default.

Other way (with writers) is similar. If you simply cast your chars to bytes, most of the time you will be fine. But if your characters contain less popular national letters, your output will be malformed/truncated. So you create a Writer that converts each given charater to a series of one or more bytes. Once again you are obligated to provide the character encoding.

Important rules:

always use InputStream when dealing with binary data (multimedia, ZIP and PDF files, etc.)
always use Reader when reading text (txt, HTML, XML...)
always know and specify character encoding when reading character from byte stream, always consciously choose character encoding you use to write the data.

A char is a 16 bit string that represents a Unicode character.

A byte is an 8 bit string that represents a 2's complement number.

The important thing here is that they are both bit strings. Technically speaking, a char is simply 2 bytes. Nothing more, nothing less aside from some minor semantics with how Java treats the two. As far as the computer (or Input/OutputStreams) are concerned, the only difference is the number of bits they hold.

I think you need to grasp the relation between a byte and a character in order to get your clarification.

The accepted answer to this question is quite clear IMHO : Why does a byte in Java I/O can represent a character?

I'd also check out byte stream and character stream

And if you don't want Joel to catch you and make you peel onions for 6 months in a submarine, just read http://www.joelonsoftware.com/articles/Unicode.html

All IO streams in java are just byte streams underneath. Byte to Character(and vice versa) conversions are done using encoding. But underneath it all, they are all bytes.

To answer your questions:

I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read?

Characters are not bytes. A character is store in one or more bytes according to the selected encoding scheme. The encoding scheme removes/extends the limit of sorts of characters you can read.

For instance, bytes are read in as 8 bit bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?

In a way, yes.

The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream and write out using a FileOutputStream, why is this file readable when I open it with a text editor? How is the FileOutputStream treating the bytes?

For bytes/data corresponding to characters, you should use OutputStreamWriter for writing to a file and make it readable with a text editor. You can specify encoding at creation and the stream will perform the encoding of you text data.

继续阅读：java-io

Java stream misconceptions... some clarification?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？