Why md5 always changes on a certain file?
I have this task that needs investigation as to why the md5 value of a file keeps changing.
Example:
I need to generate the diagnostic file of a certain machine.
After generating the file, it produces a .zip file, say, Diag.zip which contains all the information/files of that certain machine.
Inside Diag.zip file contain a .xls, say, Data.xls which contains all the summary of all 开发者_StackOverflow社区files in that certain machine, includes, the directory of the file, file version, file size, create time and md5.
Then save all the information of Data.xls in database.
After a day or so, do it again back in Step 1-4.
Then when I queried all the save data of Data.xls in the database in a 2 weeks range, and it shows that almost all files in that certain machine have its md5 value changed.
The question is: Why is it that md5 value always changed every time I generated a new diagnostic files?
There seems to be an issue with excel files, in particular Excel 2003 xls files. Whenever they get opened in Excel, even if they don't get changed and don't get saved, Excel automatically updates some of the file's metadata, such as the "Document Properties and Personal Information" and "Last Accessed Statistics". Therefore, the file every time it gets opened changes a little bit, and this makes that the MD5 changes also.
One way to avoid this problem is to remove "document properties and personal information".
Remove hidden data and personal information from Office documents. Excel 2007: Remove Hidden Data and Personal Information from Office Documents
Remove hidden data and personal information from Office documents. Excel 2013, Excel 2010: Remove Hidden Data and Personal Information by Inspecting Workbooks
Other way to avoid this would be to use xlsx files. I have been trying to replicate this behavior in xlsx files, but it seems it only happens on xls (2003).
The MD5 is based on a lot of things. But I can assume filesize, filename & creationdate. If one of those changes, the md5 hash changes. The exact same file will always return the exact same md5 hash. A new file always generates a new md5 hash.
精彩评论