MD5 hash with different results
Im trying to encode some chains to MD5 but I have noticed that:
For the chain: "123456çñ"
Some webs like
http://www.md5.net
www.md5.cz
md5generator.net
return: "66f561bb6b68372213dd9768e55e1002"
And others like:
http://ww开发者_运维百科w.adamek.biz/md5-generator.php
7thspace.com/webmaster_tools/online_md5_encoder.html
md5.rednoize.com/
return: "9e6c9a1eeb5e00fbf4a2cd6519e0cfcb"
I'd need to encode the chains with standar md5 because I need to connect my results with other systems. which hash is the correct?
Thanks in advance
The problem I guess is in different text encodings. The string you show can't be represented in ANSI encoding - it requires UTF-16 or UTF-8. The choice of one of the latter leads to different byte representation of the string and that produces different hashes.
Remember, MD5 hashes bytes, not characters - it's up to you how to encode those characters as bytes before feeding bytes to MD5. If you want to interoperate with other systems you have to use the same encoding as those systems.
Let us use Python to understand this.
>>> '123456çñ'
'123456\xc3\xa7\xc3\xb1'
>>> 'ç'
'\xc3\xa7'
>>> 'ñ'
'\xc3\xb1'
In the above output, we see the UTF-8 encoding of 'ç' and 'ñ'.
>>> md5('123456çñ').digest().encode('hex')
'66f561bb6b68372213dd9768e55e1002'
So, when we compute MD5 hash of the UTF-8 encoded data, we get the first result.
>>> u'ç'
u'\xe7'
>>> u'ñ'
u'\xf1'
Here, we see the Unicode code points of 'ç' and 'ñ'.
>>> md5('123456\xe7\xf1').digest().encode('hex')
'9e6c9a1eeb5e00fbf4a2cd6519e0cfcb'
So, when we compute MD5 hash of the data represented with the Unicode code points of each character in the string (possibly ISO-8859-1 encoded), we get the second result.
So, the first website is computing the hash of the UTF-8 encoded data while the second one is not.
If I try :
echo "123456çñ<br />";
echo "utf-8 : ".md5("123456çñ")."<br />";
echo "ISO-8859-1 : ".md5(iconv("UTF-8", "ISO-8859-1","123456çñ"))."<br />";
It gives the result :
123456çñ
utf-8 : 66f561bb6b68372213dd9768e55e1002
ISO-8859-1 : 9e6c9a1eeb5e00fbf4a2cd6519e0cfcb
The first website encode the string in ISO-8859-1 and the second in UTF-8.
I would guess that some of these sites are not correctly handling non-ascii characters. If you are using a standard md5 library then you should be OK, as long as you and the system you are connecting to agree on what character encoding you use.
By the way, MD5 is not recommended for use any more. If this is for crypto purposes then you should really be moving to SHA2.
精彩评论