开发者

do searching in a very big ARPA file in a very short time in java

I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format.

File Format

\data\

ngram 1=19

ngram 2=234

ngram 3=1013

\1-grams:

-1.7132 puluh -3.8008

-1.9782 satu -3.8368

\2-grams:

-1.54开发者_StackOverflow社区03 dalam dua -1.0560

-3.1626 dalam ini 0.0000

\3-grams:

-1.8726 itu dan tiga

-1.9654 itu dan untuk

\end\

As you see in the sample file I have 19 lines of 1-grams, 234 lines of 2-grams and 1013 lines of 3-grams. I give the string part of the line to the program and get the numbers which are at the left and at the right side of the string. The input string can help me to know in which part of the file I have to do searching.I have to find a way not to read the file completely, because my file is very big and reading the whole file takes a lot of time. I think it is a good way to jump to the specific line in the file without using the index file and access to the whole line.

It will be great if you can help me to do my assignment.


I don't know what an ARPA file is. I'm assuming it's some sort of file containing text.

What you want to do is first index the file so you can associate line numbers in the file to Strings.

That's a big file so you'd probably store your index in a separate file.

First, prior to the user searching, you'd run your index. Then you'd search your index for the line numbers where the String the user is looking for is found.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜