Self-referencing MD5 file
I'm developing a program that needs to load and save data in external files, I have been searching for options and I have chosen to save the data in a binary file.
As I don't want that someone could edit the file easily, I thought about writing in the first line of the file, its md5 sum. In this case, if some data of the file is changed, the sum won't match the one of the first line.
The problem I find then is that if I calculate t开发者_C百科he MD5, and after that I write the info inside the file, it's obvious that the sum will be different, so, how could I sort this?
If you sugest me a better option than the sum, it will be equally accepted.
Thanks in advance.
What is your threat model?
If you just want to protect against casual fiddling, md5 the main data of the file, then write the md5 sum to the end. To validate, strip off the md5 sum, then md5 only the original file.
If you want to protect against malicious and skilled cracking, you're out of luck; any validation algorithm you use can be replicated, particularly if they have access to the program itself. Even a cryptographic signature could fail if the attacker extracts the key from the program binary.
If it's a big deal, a unix solution is to run as setuid
or setgid
to a different user and write to a directory which users cannot modify. I'm not sure what a good general Java solution is, but the point remains: users shouldn't be able to modify your data because they were prevented from doing so, not because they were detected trying to.
While it is theoretically possible to make a self-referencing MD5 file (and I recall some have been found), it's a waste of resources. It is generally necessary to store the hash somewhere outside the hashed file (traditionally named md5sums
or sha1sums
, respectively).
This said, I'd recommend going for SHA-1 in addition to MD5.
Bill: Ted, while I agree that, in time, our band will be most triumphant. The truth is, Wyld Stallyns will never be a super band until we have Eddie Van Halen on guitar.
Ted: Yes, Bill. But, I do not believe we will get Eddie Van Halen until we have a triumphant video.
Bill: Ted, it's pointless to have a triumphant video before we even have decent instruments.
Ted: Well, how can we have decent instruments when we don't really even know how to play?
Bill: That is why we NEED Eddie Van Halen!
Ted: And THAT is why we need a triumphant video.
Seriously, you can't calculate the MD5 sum (or some other hash) with the calculated hash embedded, so you have to store the hash somewhere else.
If you just don't want people to easily mess with the file, maybe it's an option to obfuscate it via ROT13 or XOR "encryption" ?
What if you create a container for your data? Through a new class with two properties, CheckSum and Data, you could serialize all your data and put it in the Data property. Then, you calulate the checksum for the serialized data, and use the CheckSum property to store the checksum.
Just ignore the first line when you compute the md5. You should also add a secret salt to make sure it's not to easy to create a new MD5 after editing the content. It depends on your actual need (level of security).
you could store the MD5sum in a database instead, then when you want to see if a file has been changed you check the MD5 sum in the db. alternatively you could store the md5sum of a file in another file.