Why are Crypto++ and Ruby generating slightly different SHA-1 hashes?
I'm using two different libraries to generate a SHA-1 hash for use in file validation - an older version of the Crypto++ library and the Digest::SHA1 class implemented by Ruby. While I've seen other instances of mismatched hashes caused by encoding differences, the two libraries are outputting hashes that are almost identical.
For instance, passing a file through each process produces the following results:
Crypto++ 01c15e4f46d8181b984fa2a2c740f8f67130acac
Ruby: eac15e4f46d8181b984fa2a2c740f8f67130acac
As you can see, only the first two characters of the hash string are different, and this behavior repeats itself over many files. I've taken a look at the source code for each implementation, and the only difference I found at first glance was in the data hex that is being used for the 160-bit hashing. I have no idea how that hex is used in the algorithm, and I figured it'd probably be quicker for me to ask the question in case anyone had encountered this issue before.
I've included the data from the respective libraries below. I also included the values from OpenSSL, since each of the three libraries had slightly different values.
Crypto++:
digest[0] = 0x67452301L;
digest[1] = 0xEFCDAB89L;
digest[2] = 0x98BADCFEL;
digest[3] = 0x10325476L;
digest[4] = 0xC3D2E1F0L;
Ruby:
context->state[0] = 0x67452301;
context->state[1] = 0xEFCDAB89;
context->state[2] = 0x98BADCFE;
context->state[3] = 0x10325476;
context->state[4] = 0xC3D2E1F0;
OpenSSL:
#define INIT_DATA_h0 0x67452301UL
#define INIT_DATA_h1 0xefcdab89UL
#define INIT_DATA_h2 0x98badcfeUL
#define INIT_DATA_h3 0x10325476UL
#define INIT_DATA_h4 0xc3d2e1f0UL
By the way, here is 开发者_开发百科the code being used to generate the hash in Ruby. I do not have access to the source code for the Crypto++ implementation.
File.class_eval do
    def self.hash_digest filename, options = {}
        opts = {:buffer_length => 1024, :method => :sha1}.update(options)
        hash_func = (opts[:method].to_s == 'sha1') ? Digest::SHA1.new : Digest::MD5.new
        open(filename, "r") do |f|
            while !f.eof
                b = f.read
                hash_func.update(b)
            end
        end
        hash_func.hexdigest
    end
end
I would guess that you are off by a byte in printing out the SHA-1 hashes. Can we see the code that is printing them? If not, here are a couple of potentially useful diagnostics:
- Make a very short file (say, one word), and put its contents in as a hex string at http://www.fileformat.info/tool/hash.htm. You would need to know exactly the hex contents of the file, though. You can use xxd to that on Unix, but you'll have to watch out for endianness issues. I'm not sure how to do it on other OSs. 
- Does running the same file through the same SHA-1 implementation several times always print out the same value in that first byte? If so, does that value change when you change files? 
This isn't making much sense. If there were something wrong with the SHA1 implementation, such as with those numbers, it would likely produce hashes that are completely different than the real SHA1 hashes, rather than just one byte off. Even if there were something wrong with your file reading loop, that it would drop a newline or something, you would still get a completely different hash by changing one byte in the stream, it wouldn't be one byte off of the real SHA1 hash.
If I do use your method in the following program, I get the correct results.
#!/usr/bin/env ruby
require 'digest/sha1'
require 'digest/md5'
File.class_eval do
  def self.hash_digest filename, options = {}
    opts = {:buffer_length => 1024, :method => :sha1}.update(options)
    hash_func = (opts[:method].to_s == 'sha1') ? Digest::SHA1.new : Digest::MD5.new
    open(filename, "r") do |f|
      while !f.eof
        b = f.read
        hash_func.update(b)
      end
    end
    hash_func.hexdigest
  end
end
puts File.hash_digest(ARGV[0])
And its output compared with that of OpenSSL.
tmp$ dd if=/dev/urandom of=random.bin bs=1MB count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB) copied, 0.287903 s, 3.5 MB/s
tmp$ ./digest.rb random.bin
a511d8153426ebea4e4694cde78db4e3a9e413d1
tmp$ openssl sha1 random.bin
SHA1(random.bin)= a511d8153426ebea4e4694cde78db4e3a9e413d1
So there's nothing wrong with your hashing method. Something is going wrong between its return value and it being printed.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论