Reading a file using Java scanner
One of the lines in a java file I'm trying to understand is as below.
return new Scanner(file).useDelimiter("\\Z").next();
The file is expected to return upto "The end of the input but for the final terminator, if any" as per java.util.regex.Pattern documentation. But what happens is it returns only the first 1024 characters from the file. Is this a limitation imposed by the regex Pattern matcher? Can this be overcome? Curre开发者_运维技巧ntly I'm going ahead using a filereader. But I would like to know the reason for this behaviour.
Myself, I couldn't reproduce this. But I think I can shed light as to what is going on.
Internally, the Scanner uses a character buffer of 1024 characters. The Scanner will read from your Readable 1024 characters by default, if possible, and then apply the pattern.
The problem is in your pattern...it will always match the end of the input, but that doesn't mean the end of your input stream/data. When Java applies your pattern to the buffered data, it tries to find the first occurrence of the end of input. Since 1024 characters are in the buffer, the matching engine calls position 1024 the first match of the delimiter and everything before it is returned as the first token.
I don't think the end-of-input anchor is valid for use in the Scanner for that reason. It could be reading from an infinite stream, after all.
Try wrapping the file
object in a FileInputStream
Scanner
is intended to read multiple primitives from a file. It really isn't intended to read an entire file.
If you don't want to include third party libraries, you're better off looping over a BufferedReader
that wraps a FileReader
/InputStreamReader
for text, or looping over a FileInputStream
for binary data.
If you're OK using a third-party library, Apache commons-io has a FileUtils
class that contains the static methods readFileToString
and readLines
for text and readFileToByteArray
for binary data..
You can use the Scanner class, just specify a char-set when opening the scanner, i.e.:
Scanner sc = new Scanner(file, "ISO-8859-1");
Java converts bytes read from the file into characters using the specified charset, which is the default one (from underlying OS) if nothing is given (source). It is still not clear to me why Scanner reads only 1024 bytes with the default one, whilst with another one it reaches the end of a file. Anyway, it works fine!
精彩评论