开发者

How do I read a large file gradually?

I'm having some problems reading a file with java. It is absolutely huge (2,5G) and adjusting my memory doesn't help. The data is all on a single line so I can't read it one line at a time. What I would like to 开发者_Python百科do is to read the file until I find a certain string for example "<|start|>" or "<|end|>" and then print the data in between these strings so the memory is cleared and I can continue reading the rest of the file. So what I basically am looking for is a type of reader that starts reading at a certain start string and stops reading at a stop string. Can anyone help me?


You need to open up a Reader (e.g. a BufferedReader wrapping an InputStreamReader wrapping a FileInputStream) and read chunks at a time with read(char[], int, int) or read(char[]). It's up to you to take care of finding the token - including in the case where it starts in one chunk and ends on another. Also be aware that read() may not fill the buffer; you need to use the return value to see how much data it's actually written to the array.


I would have a look to see if Scanner is suitable for your data. You can use the useDelimiter method to change the patterns it uses to tokenize the input.


Try this pseudo code:

 char [] start = {'<','|','s','t','a','r','t','|','>' };

 char [] start = {'<','|','e','n','d','|','>' };

 char [] buff  = new char[9];

 while( true ) {
     char c = readChar();
     if( c  == '<' ) {
         buff = readChars( 9 ) ; 
         if( buff == start ) {
             inside = true ;
             skip( 9 ); // start
         } else if( buff == end )  {
             inside = false;
             skip(7); // end 
         }
      } 
      if( inside ) {
          print( char ) ;
      }
 }

The idea is to read until you find the token and raise a flag, when the flag is on you print the value, if you find the end token you shutdown the flag.

There should be a number of ways to code the previous pseudo-code. I'll update this answer later.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜