Jumping to a line and reading it
I have to work with big files (many GB) and need quick lookups to retrieve specific lines on request.
The idea has been to maintain a mapping:
some_key -> byte_location
Where the byte location represents where in the file the line starts.
Edit: the question changed a little bit:
First I used:
FileInputStream stream = new FileInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
FileChannel channel = stream.getChannel();
I noticed that FileChannel.position() will not return the exact position where the reader is currently reading because it is a "buffered" reader. It reads chunks of a given size (16k here) so what I get from the FileChannel is a multiple of 16k, and not the exact position where the reader is actually reading开发者_运维技巧.
PS: the file is in UTF-8
Any reason not to create a FileInputStream, call stream.skip(pos) and then create an InputStreamReader around that, and a BufferedReader around the InputStreamReader?
I would have tried something like this:
RandomAccessFile raf = new RandomAccessFile(file);
...
raf.seek(position);
raf.readLine();
...
The problem is that readLine() turns each byte into a character with the top 8 bits zero. That's fine if your file is ASCII or Latin-1, but problematic for UTF-8.
However, if you are prepare to use RandomAccessFile to write the file, you can use readUTF() and writeUTF() to read and write "lines" encoded as modified UTF-8 Strings.
FOLLOWUP
dammit ...utf-8 characters are screwed
Yea ... see above.
Another idea for coping with UTF-8 with RandomAccessFile:
- seek to desired position,
- use
readFully(byte[])method to read a bunch of bytes into abyte[], - locate
pos== position of the end of line in the buffer, - if not found, read more bytes, concatenate and go to step 2.
- if found, use
new String(bytes, 0, pos, UTF-8)to convert to a Java String.
This is more cumbersome than using readLine(), but it should be faster than using FileInputStream and skip() when reading multiple lines from the files in random order.
加载中,请稍侯......
精彩评论