开发者

How do you read a file until you hit a certain string in c?

I wanted to know how, in C, you can read a certain file until the reading hits a certain string, or character array. What I want to be able to do is, once the file hits that string, I want the position to be set at that point. I am going to use fseek for that, and that's not a problem. It's just the reading until a certain strin开发者_如何学运维g is hit that I am not able to do. I've been reading up on some of the functions, but there doesn't seem to be anything that guides with this. Fgets is the closest thing to this, but I don't want to provide a certain number of characters to be read, as I don't know how many. But can you give me some tips on how to do this?

Thanks!


There are many efficient string searching algorithms, each of which can be implemented in C.

http://en.wikipedia.org/wiki/String_searching_algorithm

If you're looking for a string of length N, easiest is to keep a circular buffer of length N and read 1 byte at a time from the file adding it to the circular buffer. At each step you compare your buffer with the string you're searching for. It's highly inefficient but easy to code.


There's no built-in function to do exactly what you want, but there are a few options.

Option one: Read data in chunks. You don't know exactly where your data is, so read in a few kbs of data at a time, and search within these chunks. Make sure you deal with the case where the string you're looking for straddles a chunk boundrary! Once you've located the string, use fseek() to position yourself at the start of it.

Option two: Memory map the file and use memmem() on the entire file (as mapped into memory). This requires unportable calls to set up the memory mapping, so you'll need to know your OS (or use a portability wrapper library like glib). On 32-bit machines, it will also limit the size of files you can search in to a few hundred megabytes. It is, however, a very simple and efficient approach when it's an option.

If you go with option one, the trickiest part will be dealing with the chunk-straddling case. One option is to always keep two chunks in memory, and restart the search so it begins (length of target string) - 1 bytes before the end of the previous block. The actual search could then be done using memmem() or any other string searching algorithm. You could also convert your search into a DFA (since it is a regular language) and keep the current state across blocks.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜