How to use regex's to parse a file in Java?
I'm trying to use a series of regular expressions to parse tokens from a file. I need to count newlines and be able to separate tokens that don't have a space between them. Unfortunately java.util.Scanner's findWithinHorizon() method searches the entire rest of the input stream (up to horizon) for the START of the regex match, but I want to match the regex starting at the current file position. Specifically, I have a bunch of regex's an开发者_如何学编程d want to loop through them to see which one matches starting at the current position in the file, and then advance the file position to right after the regex match, and continue. Is this possible?
Scanner's next() method seems to be useless for this because it enforces delimiters and the regex must match the entire token; I want to match from the current file position, get the matched string, and advance the file seek to after the match.
Options:
Read the whole file into memory as a String. Then use
Matcher
directly at the positions you want to.Use a
FileChannel
acquired from aRandomAccessFile
as the input for theScanner
. You can then directly manipulate the position of the channel.Use a
FileChannel
as above, but useMatcher
directly for greater flexibility.
An example of using a Matcher with a RandomAccessFile:
FileChannel fc = file.getChannel();
fc.lock(); // so it doesn't change under you
ByteBuffer bb = ByteBuffer.allocate(BUFFER_SIZE);
CharBuffer cb = bb.asCharBuffer();
fc.read(bb);
Matcher matcher = pattern.matcher(cb);
// etc.
精彩评论