Convert byte-stream to character-stream in Java
Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire byte-stream data and entire character-stream data in the memory at the same time.
Something like:
Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ..开发者_开发技巧. several more s.write() calls
s.close(); // or not needed
String text = s.getString();
// or
char[] text = s.getCharArray();
What is that Something
?
Are you looking for ByteArrayInputStream
? You could then wrap that in a InputStreamReader
and read characters out of the original byte array.
A ByteArrayInputStream
lets you "stream" from a byte array. If you wrap that in an InputStreamReader
you can read characters. The InputStreamReader
lets you stipulate the character encoding.
If you want to go directly from an input source of bytes, then you can just construct the appropriate sort of InputStream
class (FileInputStream
for example) and then wrap that in an InputStreamReader
.
You can probably mock it up using CharsetDecoder
. Something along the lines of
CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
CharBuffer cb = CharBuffer.allocate(100);
decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
...
decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
cb.position(0);
return cb.toString();
(Yes, I know this will overflow your CharBuffer
-- you may want to copy the contents into a StringBuilder
as you go.)
Your example code didn't seem to indicate that a character stream was needed. If so, String
can already handle all that you want. Assuming String s
contains the data,
char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");
The question then reduces to how to get bytes from a byte stream into String
, for which you can use ByteArrayOutputStream
, like so:
ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.
If you truly want something that has a byte input stream and a character output stream, there isn't a built-in one.
Actually the title "Convert byte-stream to character-stream in Java" contradicts your example using no streams at all but arrays. I'm assuming further you want arrays.
You surely can't start with byte[] and end with char[] (or String) without having both somewhere for a while. There are however some possibilities:
in case you really need a
char[]
: Idea: Write the byte[] into a file and read it using a FileReader into the array. This doesn't really work, since you don't know the proper array length in advance. So generate and write all the characters into a file using DataOutput, read all of them back using DataInput into an array.in case you really need a
String
: Create achar[]
as above and use reflection andsetAccessibe(true)
to invoke the package-private ctorString(int offset, int count, char value[])
.in case a
CharSequence
suffices: Create a class MyCharSequence holding the byte[]. An extremely slow solution would be to implement its methodcharAt(index)
by converting a part of the byte[] starting from the beginning until you obtainindex+1
chars. Discard all of them on the fly and keep the last one. Such a stupid method is needed since usingutf8
you don't know how many bytes corresponds with a single char. You could do it once at the beginning and remember for each char the position of its first byte. This is even more stupid, as you'd need much more memory for those positions. Fortunately, a simple space-time tradeoff exists, e.g., remember the position of the first byte for each 16th char.
All my proposals are a bit strange, but I believe, it can't be done much better. It could be a funny homework, I wouldn't go for it.
精彩评论