开发者

Why hashing functions like sha1 use only up to 16 different char (hexadecimal)?

Sorry for this curiosity that I have.

sha1 use [a-f0-9] chars for its hashing function. May I know why it doens't use all the chars possible [a-z0-9] by using all chars availabe it could grealty increase the number of possibile different hash, thus lowering the probabilty of possibile collision.

If you don't think this is a real question, just leave a comment I will instantly delete this questi开发者_运维问答on.

===

As stated in the answer, sha1 does NOT uses only 16 chars. The correct fact is: sha1 is 160 bits of binary data (cit.). I have added this to prevent confusion.


You're confusing representation with content.

sha1 is 160 bits of binary data. You can just as easily represent it with:

hex: 0xf1d2d2f924e986ac86fdf7b36c94bcdf32beec15
decimal: 1380568310619656533693587816107765069100751973397
binary: 1111000111010010110100101111100100100100111010011000011010101100100001101111110111110111101100110110110010010100101111001101111100110010101111101110110000010101
base 62: xufK3qj2bZgDrLA0XN0cLv1jZXc

There's nothing magical about hexidecimal. It's just very common mechanism for showing content that breaks easily along 4-bit boundaries.

The base 62 output is generated with this little bit of ruby:

#!/usr/bin/ruby

def chars_from_hex(s)
  c = s % 62
  s = s / 62
  if ( s > 0 )
    chars_from_hex(s)
  end
  if (c < 10)
      print c
  elsif (c < 36)
      print "abcdefghijklmnopqrstuvwxyz"[c-11].chr()
  elsif (c < 62)
      print "ABCDEFGHIJKLMNOPQRSTUVWXYZ"[c-37].chr()
  else
      puts "error c", c
  end
end

chars_from_hex(0xf1d2d2f924e986ac86fdf7b36c94bcdf32beec15)

It uses the standard idiom for converting from one base to another and treats 0-9 as 0-9, a-z as 10-35, A-Z as 36-61. It could be trivially extended to support more digits by including e.g. !@#$%^&*()-_=+\|[]{},.<>/?;:'"~` if one so desired. (Or any of the vast array of Unicode codepoints.)

@yes123 asked about the ascii representation of the hash specifically, so here is the result of interpreting the 160-bit hash directly as ascii:

ñÒÒù$é¬ý÷³l¼ß2¾ì

It doesn't look like much because:

  • ascii doesn't have a good printable representation for byte values less than 32
  • ascii itself can't represent byte values greater than 127, between 127 and 255 gets interpreted according to iso-8859-01 or other character encoding schemes

This base conversion can be practically useful, too; the Base64 encoding method uses 64 (instead of my 62) characters to represent 6 bits at a time; it needs two more characters for 'digits' and a character for padding. UUEncoding chose a different set of 'digits'. And a fellow stacker had a problem that was easily solved by changing the base of input numbers to output numbers.


This is false reasoning. sha1 uses 40*4=160 bits.

It just happens to be convenient (and therefore, the convention) to format that as 40 hex digits.

You can use different cryptographic hashes with a larger hash size, if you feel you are in a problem domain where collisions start to be likely in 160 bits

 sha224: 224 bits
 sha256: 256 bits
 md5: 128 bits


Using hex just allows for easier display. SHA1 uses 160 bits. By hex encoding it, it allows the digest to be easily displayed and transported as a string. That's all.


The output of the hash algorithm is bits. Representing them in hex is just a representation. It does benefit from a result being of length 0 mod 16, so representation in base 17 would be inconvenient.


sha-1 produces a 160 bit hash, that's 20 bytes, which has 1461501637330902918203684832716283019655932542976 possible values. Because that's how the hash algorithm is defined.

However, it's often useful encode that hash as readable text, and a convenient way is to simply encode those 20 bytes as hex(which will take up 40 bytes). And hex characters are [a-f0-9].

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜