How do I derive from hashlib.sha256 in Python?
A naive attempt fails miserably:
import hashlib
class fred(hashlib.sha256):
pass
-> TypeError: Error when calling the metaclass bases
cannot create 'builtin_function_or_method' instances
Well, it turns out that hashlib.sha256 is a callable, not a class. Trying something a bit more creative doesn't work either:
import hashlib
class fred(type(hashlib.sha256())):
pass
f = fred
-> TypeError: cannot create 'fred' instances
Hmmm...
So, how do I do it?
Here is what I want to actually achieve:
class shad_256(sha256):
"""Double SHA - sha256(sha256(data).digest())
Less susceptible to length extension attacks than sha256 alone."""
def digest(self):
return sha256(sha256.digest(self)).digest()
def hexdigest(self):
return sha256(sha256.digest(self)).hexdigest()
Basically I want everything to pass through except when someone calls for a result I want to insert an extra step of my own. Is there a clever way I can accomplish this with __new__
or metaclass magic of some sort?
I have a solution I'm largely happy with that I posted as an answer, but I'm really interested to see if anybody can think of anything better. Either much less verbose with very little cost in readability or much faster (particularly when calling update
) while still being somewhat readable.
Update: I ran some tests:
# test_sha._timehash takes three parameters, the hash object generator to use,
# the number of updates and the size of the updates.
# Built in hashlib.sha256
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(hashlib.sha256, 20000, 512)'
100 loops, best of 3: 104 msec per loop
# My wrapper based approach (see my answer)
$ python2.7 -m timeit 开发者_Python百科-n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.wrapper_shad_256, 20000, 512)'
100 loops, best of 3: 108 msec per loop
# Glen Maynard's getattr based approach
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.getattr_shad_256, 20000, 512)'
100 loops, best of 3: 103 msec per loop
Make a new class, derive from object, create a hashlib.sha256 member var in init, then define methods expected of a hash class and proxy to the same methods of the member variable.
Something like:
import hashlib
class MyThing(object):
def __init__(self):
self._hasher = hashlib.sha256()
def digest(self):
return self._hasher.digest()
And so on for the other methods.
Just use __getattr__
to cause all attributes that you don't define yourself to fall back on the underlying object:
import hashlib
class shad_256(object):
"""
Double SHA - sha256(sha256(data).digest())
Less susceptible to length extension attacks than sha256 alone.
>>> s = shad_256('hello world')
>>> s.digest_size
32
>>> s.block_size
64
>>> s.sha256.hexdigest()
'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
>>> s.hexdigest()
'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
>>> s.nonexistant()
Traceback (most recent call last):
...
AttributeError: '_hashlib.HASH' object has no attribute 'nonexistant'
>>> s2 = s.copy()
>>> s2.digest() == s.digest()
True
>>> s2.update("text")
>>> s2.digest() == s.digest()
False
"""
def __init__(self, data=None):
self.sha256 = hashlib.sha256()
if data is not None:
self.update(data)
def __getattr__(self, key):
return getattr(self.sha256, key)
def _get_final_sha256(self):
return hashlib.sha256(self.sha256.digest())
def digest(self):
return self._get_final_sha256().digest()
def hexdigest(self):
return self._get_final_sha256().hexdigest()
def copy(self):
result = shad_256()
result.sha256 = self.sha256.copy()
return result
if __name__ == "__main__":
import doctest
doctest.testmod()
This mostly eliminates the overhead for update
calls, but not completely. If you want to completely eliminate it, add this to __init__
(and correspondingly in copy
):
self.update = self.sha256.update
That will eliminate the extra __getattr__
call when looking up update
.
This all takes advantage of one of the more useful and often overlooked properties of Python member functions: function binding. Recall that you can do this:
a = "hello"
b = a.upper
b()
because taking a reference to a member function doesn't return the original function, but a binding of that function to its object. That's why, when __getattr__
above returns self.sha256.update
, the returned function correctly operates on self.sha256
, not self
.
So, here is the answer I came up with that's based on Glen's answer, which is the one I awarded him the bounty for:
import hashlib
class _double_wrapper(object):
"""This wrapper exists because the various hashes from hashlib are
factory functions and there is no type that can be derived from.
So this class simulates deriving from one of these factory
functions as if it were a class and then implements the 'd'
version of the hash function which avoids length extension attacks
by applying H(H(text)) instead of just H(text)."""
__slots__ = ('_wrappedinstance', '_wrappedfactory', 'update')
def __init__(self, wrappedfactory, *args):
self._wrappedfactory = wrappedfactory
self._assign_instance(wrappedfactory(*args))
def _assign_instance(self, instance):
"Assign new wrapped instance and set update method."
self._wrappedinstance = instance
self.update = instance.update
def digest(self):
"return the current digest value"
return self._wrappedfactory(self._wrappedinstance.digest()).digest()
def hexdigest(self):
"return the current digest as a string of hexadecimal digits"
return self._wrappedfactory(self._wrappedinstance.digest()).hexdigest()
def copy(self):
"return a copy of the current hash object"
new = self.__class__()
new._assign_instance(self._wrappedinstance.copy())
return new
digest_size = property(lambda self: self._wrappedinstance.digest_size,
doc="number of bytes in this hashes output")
digestsize = digest_size
block_size = property(lambda self: self._wrappedinstance.block_size,
doc="internal block size of hash function")
class shad_256(_double_wrapper):
"""
Double SHA - sha256(sha256(data))
Less susceptible to length extension attacks than SHA2_256 alone.
>>> import binascii
>>> s = shad_256('hello world')
>>> s.name
'shad256'
>>> int(s.digest_size)
32
>>> int(s.block_size)
64
>>> s.hexdigest()
'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
>>> binascii.hexlify(s.digest()) == s.hexdigest()
True
>>> s2 = s.copy()
>>> s2.digest() == s.digest()
True
>>> s2.update("text")
>>> s2.digest() == s.digest()
False
"""
__slots__ = ()
def __init__(self, *args):
super(shad_256, self).__init__(hashlib.sha256, *args)
name = property(lambda self: 'shad256', doc='algorithm name')
This is a little verbose, but results in a class that works very nicely from a documentation perspective and has a relatively clear implementation. With Glen's optimization, update
is as fast as it possibly can be.
There is one annoyance, which is that the update
function shows up as a data member and doesn't have a docstring. I think that's a readability/efficiency tradeoff that's acceptable.
from hashlib import sha256
class shad_256(object):
def __init__(self, data=''):
self._hash = sha256(data)
def __getattr__(self, attr):
setattr(self, attr, getattr(self._hash, attr))
return getattr(self, attr)
def copy(self):
ret = shad_256()
ret._hash = self._hash.copy()
return ret
def digest(self):
return sha256(self._hash.digest()).digest()
def hexdigest(self):
return sha256(self._hash.digest()).hexdigest()
Any attributes that are not found on an instance are bound lazily by __getattr__
. copy()
needs to be treated specially of course.
精彩评论