Java performance of byte[] vs. char[] for file stream
I'm writing a program that reads a file (uses custom buffer, 8KB), then finds a keyword in that buffer. Since Java provides two type of streams: cha开发者_如何转开发racter & byte, I've implemented this using both byte[]
and char[]
for buffering.
I just wonder, which would be faster and better for performance, since a char
is 2 byte
and when using Reader
to read up char[]
, the Reader
will perform converting back from byte
to char
, which I think could make it slower than using only byte[]
.
Using a byte array will be faster:
You don't have the bytes to characters decoding step, which is at least a copy loop, and possibly more depending on the Charset used to do the decoding.
The byte array will take less space, and hence save CPU cycles in GC / initialization.
However:
Unless you are searching huge files, the difference is unlikely to be significant.
The byte array approach could FAIL if the input file is not encoded in an 8 bit character set. And even if it works (as it does for UTF-8 & UTF-16) there are potential issues with matching characters that span buffer boundaries.
(The reason that byte-wise treatment works for UTF-8 and UTF-16 is that the encoding makes it easy to distinguish between the first unit (byte or short) and subsequent units of an encoded character.)
If it's a binary file you're reading use a byte array.
If it's a text file and you're going to be using the contents like strings later then you should use a char array.
This stack overflow question file-streaming-in-java talks about streaming files efficiently in java.
I particularly like this reference article
On large files, you quickly have advantages of speed using only bytes, so if you can decode the pattern through bytes you could definitively gain a few precious cycles.
If your files are small, or you don't have so many, maybe it's not worth the trouble.
精彩评论