How do you detect if a text file is corrupt before the program crashes?
I'm writing a command line program in ANSI C to parse a Quake 2 map file to report how many entities and textures are being used. My development machine is MacBook. I'm testing on OS X Snow Leopard (32-bit), Windows XP (32-bit) and Vista (64-bit), and Ubuntu 9.10 (32-bit).
I开发者_C百科 had a crashed bug on Vista where the program would hanged with a certain map file. Took a while to figure out that it wasn't the program but the map file itself. I didn't noticed anything unusual about the text file. Re-opening and saving the map file fixed that issue.
My code loads the entire map file into memory, uses strtok() to separate the lines using '\n', parses each line, and loads the data into a single-link list for processing. Is there a way to detect if the map (text) file is corrupt?
The easiest non-programming solution is to add a FAQ file with the problem and solution.
As you read each line parse it, to determine whether it is valid or not. If your method fails, you can simply let the user know that the data is corrupt, yet you still have a graceful exit.
With parser generator tools, you can detect syntactical errors easily.
However, even if the syntax is ok, you should always assume that the contents might not be ok.
For example, if the file format is as follows:
- n : number of entries
- entry 1
- entry 2
- ...
- end condition
your code should not just allocate n sized array and read the entries into the array until the end condition. Instead, you should verify that n entries were actually read (and in this case, never read more than n entries to avoid overflow).
Thus, design the code so that it does not blindly trust the input.
I think I fixed the bug. I took a number of steps to get there and testing went fine.
- Added -Wconversion to my debug mix for GCC. This reported some useful warnings and not so useful warnings. For the most part, adding unsigned to the variable types and a few minor (int) cast.
- While my data structures had the correct types (i.e., unsigned long int), the output variables that added everything together were the wrong types (i.e., int). Re-checked all my variable types to make sure they all matched.
- Added a check if the file had zero or negative byte size to halt the program with an error.
- Added a check if the data lists had zero nodes (i.e., parsing return no valid match) to halt the program with a message that file has no usable data.
I left the parsing functions alone for now. If a corrupt or mangled map file has a valid match, that "data" will eventually be outputted. Garbage In/Garbage Out (GIGO) is still a factor. Something to revisit later. The released version of my program can be found here.
精彩评论