开发者

MD5 hash with different results

Im trying to encode some chains to MD5 but I have noticed that:

For the chain: "123456çñ"

Some webs like

http://www.md5.net

www.md5.cz

md5generator.net

return: "66f561bb6b68372213dd9768e55e1002"

And others like:

http://ww开发者_运维百科w.adamek.biz/md5-generator.php

7thspace.com/webmaster_tools/online_md5_encoder.html

md5.rednoize.com/

return: "9e6c9a1eeb5e00fbf4a2cd6519e0cfcb"

I'd need to encode the chains with standar md5 because I need to connect my results with other systems. which hash is the correct?

Thanks in advance


The problem I guess is in different text encodings. The string you show can't be represented in ANSI encoding - it requires UTF-16 or UTF-8. The choice of one of the latter leads to different byte representation of the string and that produces different hashes.

Remember, MD5 hashes bytes, not characters - it's up to you how to encode those characters as bytes before feeding bytes to MD5. If you want to interoperate with other systems you have to use the same encoding as those systems.


Let us use Python to understand this.

>>> '123456çñ'
'123456\xc3\xa7\xc3\xb1'
>>> 'ç'
'\xc3\xa7'
>>> 'ñ'
'\xc3\xb1'

In the above output, we see the UTF-8 encoding of 'ç' and 'ñ'.

>>> md5('123456çñ').digest().encode('hex')
'66f561bb6b68372213dd9768e55e1002'

So, when we compute MD5 hash of the UTF-8 encoded data, we get the first result.

>>> u'ç'
u'\xe7'
>>> u'ñ'
u'\xf1'

Here, we see the Unicode code points of 'ç' and 'ñ'.

>>> md5('123456\xe7\xf1').digest().encode('hex')
'9e6c9a1eeb5e00fbf4a2cd6519e0cfcb'

So, when we compute MD5 hash of the data represented with the Unicode code points of each character in the string (possibly ISO-8859-1 encoded), we get the second result.

So, the first website is computing the hash of the UTF-8 encoded data while the second one is not.


If I try :

echo "123456çñ<br />";
echo "utf-8 : ".md5("123456çñ")."<br />";
echo "ISO-8859-1 : ".md5(iconv("UTF-8", "ISO-8859-1","123456çñ"))."<br />";

It gives the result :

123456çñ
utf-8 : 66f561bb6b68372213dd9768e55e1002
ISO-8859-1 : 9e6c9a1eeb5e00fbf4a2cd6519e0cfcb

The first website encode the string in ISO-8859-1 and the second in UTF-8.


I would guess that some of these sites are not correctly handling non-ascii characters. If you are using a standard md5 library then you should be OK, as long as you and the system you are connecting to agree on what character encoding you use.

By the way, MD5 is not recommended for use any more. If this is for crypto purposes then you should really be moving to SHA2.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜