Java RandomAccessFile - dealing with different newline styles?
I'm trying to seek through a RandomAccessFile, and as part of an algorithm I have to read a line, and then seek backwards from the end of the line
E.g
String line = raf.readLine();
raf.seek (raf.getFilePointer() - line.length() + m.start() + m.group().length());
//m is a Matcher for regular expressions
I've been getting loads of off-by-one errors and couldn't figure out why. I just dis开发者_StackOverflow社区covered it's because some files I'm reading from have UNIX-style linefeeds, \r\n, and some have just windows-style \n.
Is there an easy to have the RandomAccessFile treat all linefeeds as windows-style linefeeds?
You could always back the stream up two bytes and re-read them to see if it is \r \n or (!\r)\n:
String line = raf.readLine();
raf.seek(raf.getFilePointer()-2);
int offset = raf.read() == '\r' ? 2 : 1;
raf.read(); //discard the second character since you know it is either \n or EOF by definition of readLine
raf.seek (raf.getFilePointer() - (line.length()+offset) + m.start() + m.group().length());
I'm not sure exactly where you are trying to place the file pointer, so adjust the 2/1 constants appropriately. You may also need to add an extra check for blank lines (\n\n) if they occur in your file, as if it shows up you might get stuck in an infinite loop without code to step past it.
No. RandomAccessFile and related abstractions (including the underlying file systems) model files as an indexable sequence of bytes. They neither know or care about lines or line terminations.
What you need to do is record the actual positions of line starts rather than trying to figure out where they are based on assumptions about what the line termination sequence is. Alternatively, use an line reader that captures the line termination sequence for each line that it reads, either as part of the line or in an attribute that can be accessed after reading each input line.
Alternatively, convert all the files to use DOS line termination sequences before you open them for random access.
精彩评论