开发者

C/C++ Determine Whether Files have been completely written

I have a directory (DIR_A) to dump from Server A t开发者_运维技巧o Server B which is expected to take a few weeks. DIR_A has the normal tree structure i.e. a directory could have subfolders or files, etc

Aim: As DIR_A is being dumped to server B, I will have to go through DIR_A and search for certain files within it (do not know the exact name of each file because server A changes the names of all the files being sent). I cannot wait for weeks to process some files within DIR_A. So, I want to start manipulating some of the files once I receive them at server B.

Brief: Server A sends DIR_A to Server B. Expected to take weeks. I have to start processing the files at B before the upload is complete.

Attempt Idea: I decided to write a program that will list the contents of DIR_A. I went on finding out whether files exist within folders and subfolders of DIR_A. I thought that I might look for the EOF of a file within DIR_A. If it is not present then the file has not yet been completely uploaded. I should wait till the EOF is found. So, I keep looping, calculating the size of the file and verifying whether EOF is present. If this is the case, then I start processing that file.

To simulate the above, I decided to write and execute a program writing to a text file and then stopped it in the middle without waiting for completion. I tried to use the program below to determine whether the EOF could be found. I assumed that since I abrubtly ended the program writing to the text file the eof will not be present and hence the output "EOF FOUND" should not be reached. I am wrong since this was reached. I also tried with feof(), and fseek().

std::ifstream file(name_of_file.c_str, std::ios::binary);
//go to the end of the file to determine eof
char character;
file.seekg(0, ios::end);
while(!file.eof()){

    file.read(character, sizeof(char));

}
file.close();
std::cout << "EOF FOUND" << std::endl

Could anyone provide with an idea of determining whether a file has been completely written or not?


EOF is simply C++'s way of telling you there is no more data. There's no EOF "character" that you can use to check if the file is completely written.

The way this is typically accomplished is to transfer the file over with one name, i.e. myfile.txt.transferring, and once the transfer is complete, move the file on the target host (back to something like myfile.txt). You could do the same by using separate directories.


Neither C nor C++ have a standard way to determine if the file is still open for writing by another process. We have a similar situation: a server that sends us files and we have to pick them up and handle as soon as possible. For that we use Linux's inotify subsystem, with a watch configured for IN_CLOSE_WRITE events (file was closed after having been opened for writing), which is wrapped in boost::asio::posix::stream_descriptor for convenient asynchronicity.

Depending on the OS, you may have a similar facility. Or just lsof as already suggested.


All finite files have an end. If a file is being written by one process, and (assuming the OS allows it) simultaneously read (faster than it is being written) by another process,then the reading process will see an EOF when it has read all the characters that have been written.

What would probably work better is, if you can determine a length of time during which you can guarantee that you'll receive a significant number of bytes and write them to a file (beware OS buffering), then you can walk the directory once per period, and any file that has changed its file size can be considered to be unfinished.

Another approach would require OS support: check what files are open by the receiving process, with a tool like lsof. Any file open by the receiver is unfinished.


In C, and I think it's the same in C++, EOF is not a character; it is a condition a file is (or is not) in. Just like media removed or network down is not a character.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜