Why hashing functions like sha1 use only up to 16 different char (hexadecimal)?
Sorry for this curiosity that I have.
sha1 use [a-f0-9]
chars for its hashing function. May I know why it doens't use all the chars possible [a-z0-9]
by using all chars availabe it could grealty increase the number of possibile different hash, thus lowering the probabilty of possibile collision.
If you don't think this is a real question, just leave a comment I will instantly delete this questi开发者_运维问答on.
===
As stated in the answer, sha1 does NOT
uses only 16 chars
. The correct fact is: sha1 is 160 bits of binary data (cit.). I have added this to prevent confusion.
You're confusing representation with content.
sha1 is 160 bits of binary data. You can just as easily represent it with:
hex: 0xf1d2d2f924e986ac86fdf7b36c94bcdf32beec15
decimal: 1380568310619656533693587816107765069100751973397
binary: 1111000111010010110100101111100100100100111010011000011010101100100001101111110111110111101100110110110010010100101111001101111100110010101111101110110000010101
base 62: xufK3qj2bZgDrLA0XN0cLv1jZXc
There's nothing magical about hexidecimal. It's just very common mechanism for showing content that breaks easily along 4-bit boundaries.
The base 62
output is generated with this little bit of ruby:
#!/usr/bin/ruby
def chars_from_hex(s)
c = s % 62
s = s / 62
if ( s > 0 )
chars_from_hex(s)
end
if (c < 10)
print c
elsif (c < 36)
print "abcdefghijklmnopqrstuvwxyz"[c-11].chr()
elsif (c < 62)
print "ABCDEFGHIJKLMNOPQRSTUVWXYZ"[c-37].chr()
else
puts "error c", c
end
end
chars_from_hex(0xf1d2d2f924e986ac86fdf7b36c94bcdf32beec15)
It uses the standard idiom for converting from one base to another and treats 0-9
as 0-9, a-z
as 10-35, A-Z
as 36-61. It could be trivially extended to support more digits by including e.g. !@#$%^&*()-_=+\|[]{},.<>/?;:'"~`
if one so desired. (Or any of the vast array of Unicode codepoints.)
@yes123 asked about the ascii representation of the hash specifically, so here is the result of interpreting the 160-bit hash directly as ascii:
ñÒÒù$é¬ý÷³l¼ß2¾ì
It doesn't look like much because:
- ascii doesn't have a good printable representation for byte values less than 32
- ascii itself can't represent byte values greater than 127, between 127 and 255 gets interpreted according to iso-8859-01 or other character encoding schemes
This base conversion can be practically useful, too; the Base64 encoding method uses 64 (instead of my 62) characters to represent 6 bits at a time; it needs two more characters for 'digits' and a character for padding. UUEncoding chose a different set of 'digits'. And a fellow stacker had a problem that was easily solved by changing the base of input numbers to output numbers.
This is false reasoning. sha1 uses 40*4=160 bits.
It just happens to be convenient (and therefore, the convention) to format that as 40 hex digits.
You can use different cryptographic hashes with a larger hash size, if you feel you are in a problem domain where collisions start to be likely in 160 bits
sha224: 224 bits
sha256: 256 bits
md5: 128 bits
Using hex just allows for easier display. SHA1 uses 160 bits. By hex encoding it, it allows the digest to be easily displayed and transported as a string. That's all.
The output of the hash algorithm is bits. Representing them in hex is just a representation. It does benefit from a result being of length 0 mod 16, so representation in base 17 would be inconvenient.
sha-1 produces a 160 bit hash, that's 20 bytes, which has 1461501637330902918203684832716283019655932542976 possible values. Because that's how the hash algorithm is defined.
However, it's often useful encode that hash as readable text, and a convenient way is to simply encode those 20 bytes as hex(which will take up 40 bytes). And hex characters are [a-f0-9].
精彩评论