Java self-cheking program (auto checksum)
I have to analze a little java self-check program here is the sample
public class tamper {
public static int checksum_self () throws Exception {
File file = new File ("tamper.class");
FileInputStream fr = new FileInputStream (file);
int result; // Compute the checksum
DigestInputStream sha = new DigestInputStream(fr, MessageDigest.getInstance("SHA"));
byte[] digest = sha.getMessageD开发者_JAVA技巧igest();
int result = 12 // why???
for(int i=0;i<=digest;i++)
{
result = (result + digest[i]) % 16 /// modulus 16 to have the 16 first bytes but why ??
}
return result;
}
public static boolean check1_for_tampering () throws Exception {
return checksum_self () != 10;
}
public static void main (String args[]) throws Exception {
if (check1_for_tampering ()) {
System.exit (-1);
}
}
}
But i dont really understant why to do a mod 16 and put result = 12 ?
mod 16 doesn't give the last 16 bytes, or even the lowest 4 bits. It gives the remainder of n / 16. This can just as easily be negative as positive and is not a good way to cumulate the bytes of the digest.
There is a 1/31 chance that two random files would produce the same result.
The simple way I can think of which would be much more discerning would be
return new String(digest, 0).hashCode();
Two files would have a one in 4 billion chance of having the same hash code, and the code is much shorter.
Maybe the writer of this sample code didn't want to check the whole 16 bytes of the digest, so he decided to make a hash out of the hash, that's what the mod 16 operation is for. It hashes modulo 16 the first 16 bytes of the digest (to be more specific, it's a 4-bit digest of the digest), and then compares the result with 10. Correct me if I'm wrong but I think the values 12 and 10 are chosen randomly, to ensure the actual hash and the value it's checked against are matching.
As Peter said, it's not a quite perfect solution.
It's probably, because without the modulo it would be very hard to embed the checksum. Imagine you've written the program. You can write it all at once, but to have correct checksum you have to experiment.
Let's assume you write the checksum modulo 4. At the beginning you compare the value to 0. You run the program, but it detects it was tampered with. Why? Because you don't know the checksum until you write whole source code. And since the checksum value is embedded in it, every change of the source code changes the checksum.
So it's like a dog chasing it's own tail. Or a snake eating it's own tail. Technically speaking, this is a situation of dynamic system with feedback loop in it. Ok, that's enough of analogies.
The only way to make it work is to experiment. Start out with checksum equal to zero and compile. Most probably it will recognize it was tampered with (uncorrectly), since you have about 1/4 probability (since any value modulo 4 can have 4 values) to guess correctly. Next you change the value to 1. If it's not working than to 2 and lastly to 3.
One of them could match, but having low modulo value decreases the likelyhood of detecting the tampering. So the value 16 is basically a compromise. You want to keep the modulo value as low as possible to have reasonably low amount of guesses. On the other hand You want the algorithm to be quite tamper-proof by having modulo value high.
精彩评论