Given a FILE * , how to find the offset of the 1st occurence of "abc" efficiently?
How to do this kind of job efficiently in C?
What I can think of is first load the whole fi开发者_如何学Pythonle into memory and then search though it..
But is there a more efficient way?
UPDATE
To load the whole file into memory will be impossible if the file is extremely big.
Loading the whole file into memory is unnecessary and inefficient. Try something like this:
FILE *fl;
int cc = getc(fl);
while (cc != EOF)
{
if (cc=='a')
{
cc = getc(fl);
if (cc=='b')
{
cc = getc(fl);
if (cc=='c')
return "FOUND";
}
}
cc = getc(fl);
}
return "NOT FOUND";
Obviously you would never actually use code like this. You should write a function that takes an arbitrary string to search for, but the algorithm is basically the same. Also the I/O will be buffered by the system so you shouldn't have to worry about efficiency of reading a single character at a time. Also I didn't include any error checking.
you can read in the file block-by-block and search for "abc" in each block. There are algorithms like the Boyer-Moore search to reduce the number of characters you have to explicitly check.
in linux, you can use posix_fadvise
to tell it that you will be slurping the file.
For string searches there are many interesting algorithms. For example in Boyer-Moore you would take advantage of the fact that the 3rd position must be 'c' if you are to match 'abc', and if it is not 'c' then a table would say how far to advance (for example if it's 'd' you can skip ahead 3 because the first 3 letters can't be interesting to you at all).
However, interesting string search methods are not going to matter at all versus the time spent reading the file. You should avoid reading it all in if you want to handle arbitrary files because the extra memory use is wasteful and will slow you down. But you can't avoid reading the entire file up to the point where you find your string.
What OS are you using? If it's Linux you can use a memory map to automatically map a certain portion of memory directly to the file. It's considered much faster.
EDIT
mmap doesn't load the whole file into memory at once. It's just more efficient.
精彩评论