开发者

How can I search a string in a very big file with a specific format in java? [duplicate]

This question already has an answer here: Closed 11 years ago.

Possible Duplicate:

do searching in a very big ARPA file in a very short time in java

my file's format:

\data\

ngram 1=19

ngram 2=234

ngram 3=1013

\1-grams:

-1.7132 puluh -3.8008

-1.9782 satu -3.8368

\2-grams:

-1.5403 dalam dua -1.0560

-3.1626 dalam ini 0.0000

\3-grams:

-1.8726 itu dan tiga

-1.9654 itu dan untuk

\end\

As you can see I have a number of lines in ngram 1,2 and 3. There is no need to read the whole file. If an input string is a one-word string, the program can just search in \1-grams: part. If an input string is a two-word string, the program can just search in \2-grams: part and so on. At last if the program finds the input string in the file, it has to return two numbers which are located at the left and right sides of the string. Also, I have to say that each part of the file has been sorted. I am sure that I do not have to read the file completely, and using the index file can not solve my problem. These ways take a lot of time, and my lecturer said that searching has to be done in less than 1 minute for such a big file. I think the best thing is to find a way to jump to a specific line not byte of the file, but I do not know how I can do it. It will be great if someone can help me to solve my problem.

My file is almost 800MB. I have found that u开发者_运维百科sing BufferedReader is a good way to read a file very fast, but when I read such a big file and put it in an array line by line, it takes more than 30 minutes.


How big is your file? A minute is a very long time. I would suggest using a BufferedReader for efficiency (and also for its readLine method).

If that really takes too long, two approaches come to mind that don't use indexes:

  1. Force every line in the file to be the same length. Then you can jump to a specific line by calculating its start. If you don't know the line number you need, then at least you can use this to efficiently do a binary search of the entire file.

  2. Jump to an arbitrary position and read forward until you get to a line that starts with a \. That will tell you whether you've found the right part or whether you need to jump forward from there or backward from the arbitrary position that you jumped to. This can also be used to create a binary search strategy for the data you need. It relies on the \ being a reliable indicator of the start of a part.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜