alternate different shas of the same file. why?
can somebody give me a reason why i get alternate shas of the same file? but the same for every second sha?
>>> f = open('480p.m4v')
>>> sha1 = str(hashlib.sha224(str(f)).hexdigest())
>>> sha1
'4aa8cf11b849b77f608302fdcdad3703dce54c33ba4bac80fa0ef700'
>>> f.close()
>>> f = open('480p.m4v')
>>> sha2 = str(hashlib.sha224(str(f)).hexdigest())
>>> f.close()
>>> sha2
'ae60e45200c960f79d25049ef0135709开发者_Go百科ca6edf246b3f9e53cd084e58'
>>> f = open('480p.m4v')
>>> sha3 = str(hashlib.sha224(str(f)).hexdigest())
>>> f.close()
>>> sha3
'4aa8cf11b849b77f608302fdcdad3703dce54c33ba4bac80fa0ef700'
>>> f = open('480p.m4v')
>>> sha4 = str(hashlib.sha224(str(f)).hexdigest())
>>> f.close()
>>> sha4
'ae60e45200c960f79d25049ef0135709ca6edf246b3f9e53cd084e58'
>>> f = open('480p.m4v')
>>> sha5 = str(hashlib.sha224(str(f)).hexdigest())
>>> f.close()
>>> sha5
'4aa8cf11b849b77f608302fdcdad3703dce54c33ba4bac80fa0ef700'
>>> f = open('480p.m4v')
>>> sha6 = str(hashlib.sha224(str(f)).hexdigest())
>>> f.close()
>>> sha6
'ae60e45200c960f79d25049ef0135709ca6edf246b3f9e53cd084e58'
The reason you're getting different hashes is because you're not really hashing the contents of the file, only the file object's string representation. For example:
>>> f = open('480p.m4v')
>>> print str(f)
<open file '480p.m4v', mode 'r' at 0x0224C9D0>
You'll note that the address of the object is obviously changing between the different instances, causing the hash to change. Apparently the memory location of one file object is reused by every second other instance created, causing the hashes to coincide.
To hash the contents of the file, you can use this:
>>> sha = str(hashlib.sha224(f.read()).hexdigest()) # read() slurps the whole file into a string
str(f)
doesn't give you the contents of the file, it will return something like:
"<open file '480p.m4v', mode 'r' at 0xb7855230>"
I'm not sure why this alternates, though.
As the others said, the reason this is failing is that you're hashing the object's string representation. I expect the reason it's alternating is because the string representation includes the memory address the file object is stored at. When you do:
f = open(...)
you store that file object in f
, pointing to memory X. When you do the same thing again, open()
is called and allocates more memory. Since f
is still pointing to memory X, that memory remains in use, and the second open()
allocates new memory at Y instead. However, as soon as open()
returns the result is assigned to f
. Now the file object pointing to memory X is dangling and is garbage-collected. The next call to open()
will reuse the memory at X since it's free now (this isn't guaranteed, but is common)
精彩评论