开发者

Cause of corrupted file contents

I'm having a recurring problem with an app in the wild.

It has a fairly simple XML file it dumps out every now and then, something like every 30 minutes.

The data files are often quite small - e.g. < 5KB.

It doesn't hold a lock on the file - it just recreates it from scratch each time.

I was lucky enough to see the problem occur on a test machine, and what I observed was that the file was corrupted and set to "nulls" (i.e. 00 in Hex). What's really weird is that it is exactly the correct length compared to what it should have been.

I've tried to be really careful in my saving process:

  1. I write the xml to a temp file in the same directory as I'm going to really save it
  2. I perform a Win32 MoveFile() with the MOVEFILE_WR开发者_开发技巧ITE_THROUGH set (so it should block until the move is really and truly complete), to move the file to replace the existing data file

I even lock on a Mutex to make sure this isn't a threading issue.

It doesn't happen that often, like maybe 1 in 1000 users.

Now I have in the past observed data files being corrupted by a power failure or BSOD during writing, and I've seen things like the 32kb of a file being all NULL.

But it just seems like it's happening more than I'd expect, given the chances of a power failure during the write, and espcecially since I'm using MOVEFILE_WRITE_THROUGH.

Any ideas?

John


Answers to some questions:

  • Q: Why not write to the file directly A: I avoided this to make the software less vulnerable to power failure issues. E.g. you're halfway through writing the file and crash/powerfail/BSOD then you definitely have a corrupted file. Doing a temp file write and then a move is a commonly used and simple way of ensuring that you do an atomic file operation as possible (well, as close as is reasonable without using NTFS specific APIs). I should say that the software is an archiving/backup system, so I have to take more care with data consistency than other apps might.

  • Q: Does this happen during normal operation?

  • A: As this issue occurs in the wild, I'm only working with a few clues, so I don't know for sure. I can say that the software works reliabely 99.9% of the time. I guess that's the nub of my question: is this just random unluckyness caused by BSOD/power failure or is it a bug?

  • Q: What environment/OS:

  • A: XP, Vista, 7, Server 200X. Most likely NTFS, but could be FAT32

  • Q: Am I closing the file before moving

  • A: Yes. I'm using C++ streams and calling close() before I do the MoveFile

  • Q: What other processses are accessing the file?

  • A: None managed by me. Obviously I'm not in control of Virus Checker, Folder Syncers, etc. The file is located in the AppData\Local folder of the user's machine.


As my experience, it is possibly cause by file cache in windows. You should try to save file using CreateFile() with FILE_FLAG_WRITE_THROUGH pass in. Saving file by this way can make sure the file will land in hard disk.

I had wroten a little program to test this. If the program create file with std::ofstream and use MoveFileEx() with MOVEFILE_WRITE_THROUGH to move that file, the file corrupt almost every time if power off (not shutdown) the VM immediately after file move finished; Otherwise, if the program use CreateFile() with FILE_FLAG_WRITE_THROUGH to create file and do the same thing again, The file didn't corrupt (I tested for about 10 times but it didn't happened).

After those simple tests, I think you should try to use CreateFile() with FILE_FLAG_WRITE_THROUGH to solve your problem.

More information:
File Caching (Windows)
Windows Internals, 6th edition, Chapter 11 Cache Manager


Here are some ideas:

  • Flush the stream after critical information or before long periods of no writing.
  • Verify that no other entities are writing to the file.
  • Verify that the buffered data is not overwritten by other code.
  • Close the file between long durations of no writing.


I was facing same problem and my code is exactly as you explained, this seems quite unorthodox but to make it work making multiple backup file was a solution to me, while reading if some issue occurs I assume it is corrupted and I read it from that backup file.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜