开发者

How do I derive from hashlib.sha256 in Python?

A naive attempt fails miserably:

import hashlib

class fred(hashlib.sha256):
    pass

-> TypeError: Error when calling the metaclass bases
       cannot create 'builtin_function_or_method' instances

Well, it turns out that hashlib.sha256 is a callable, not a class. Trying something a bit more creative doesn't work either:

 import hashlib

 class fred(type(hashlib.sha256())):
     pass

 f = fred

 -> TypeError: cannot create 'fred' instances

Hmmm...

So, how do I do it?

Here is what I want to actually achieve:

class shad_256(sha256):
    """Double SHA - sha256(sha256(data).digest())
Less susceptible to length extension attacks than sha256 alone."""
    def digest(self):
        return sha256(sha256.digest(self)).digest()
    def hexdigest(self):
        return sha256(sha256.digest(self)).hexdigest()

Basically I want everything to pass through except when someone calls for a result I want to insert an extra step of my own. Is there a clever way I can accomplish this with __new__ or metaclass magic of some sort?

I have a solution I'm largely happy with that I posted as an answer, but I'm really interested to see if anybody can think of anything better. Either much less verbose with very little cost in readability or much faster (particularly when calling update) while still being somewhat readable.

Update: I ran some tests:

# test_sha._timehash takes three parameters, the hash object generator to use,
# the number of updates and the size of the updates.

# Built in hashlib.sha256
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(hashlib.sha256, 20000, 512)'
100 loops, best of 3: 104 msec per loop

# My wrapper based approach (see my answer)
$ python2.7 -m timeit 开发者_Python百科-n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.wrapper_shad_256, 20000, 512)'
100 loops, best of 3: 108 msec per loop

# Glen Maynard's getattr based approach
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.getattr_shad_256, 20000, 512)'
100 loops, best of 3: 103 msec per loop


Make a new class, derive from object, create a hashlib.sha256 member var in init, then define methods expected of a hash class and proxy to the same methods of the member variable.

Something like:

import hashlib

class MyThing(object):
    def __init__(self):
        self._hasher = hashlib.sha256()

    def digest(self):
        return self._hasher.digest()

And so on for the other methods.


Just use __getattr__ to cause all attributes that you don't define yourself to fall back on the underlying object:

import hashlib

class shad_256(object):
    """
    Double SHA - sha256(sha256(data).digest())
    Less susceptible to length extension attacks than sha256 alone.

    >>> s = shad_256('hello world')
    >>> s.digest_size
    32
    >>> s.block_size
    64
    >>> s.sha256.hexdigest()
    'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
    >>> s.hexdigest()
    'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
    >>> s.nonexistant()
    Traceback (most recent call last):
    ...
    AttributeError: '_hashlib.HASH' object has no attribute 'nonexistant'
    >>> s2 = s.copy()
    >>> s2.digest() == s.digest()
    True
    >>> s2.update("text")
    >>> s2.digest() == s.digest()
    False
    """
    def __init__(self, data=None):
        self.sha256 = hashlib.sha256()
        if data is not None:
            self.update(data)

    def __getattr__(self, key):
        return getattr(self.sha256, key)

    def _get_final_sha256(self):
        return hashlib.sha256(self.sha256.digest())

    def digest(self):
        return self._get_final_sha256().digest()

    def hexdigest(self):
        return self._get_final_sha256().hexdigest()

    def copy(self):
        result = shad_256()
        result.sha256 = self.sha256.copy()
        return result

if __name__ == "__main__":
    import doctest
    doctest.testmod()

This mostly eliminates the overhead for update calls, but not completely. If you want to completely eliminate it, add this to __init__ (and correspondingly in copy):

self.update = self.sha256.update

That will eliminate the extra __getattr__ call when looking up update.

This all takes advantage of one of the more useful and often overlooked properties of Python member functions: function binding. Recall that you can do this:

a = "hello"
b = a.upper
b()

because taking a reference to a member function doesn't return the original function, but a binding of that function to its object. That's why, when __getattr__ above returns self.sha256.update, the returned function correctly operates on self.sha256, not self.


So, here is the answer I came up with that's based on Glen's answer, which is the one I awarded him the bounty for:

import hashlib

class _double_wrapper(object):
    """This wrapper exists because the various hashes from hashlib are
    factory functions and there is no type that can be derived from.
    So this class simulates deriving from one of these factory
    functions as if it were a class and then implements the 'd'
    version of the hash function which avoids length extension attacks
    by applying H(H(text)) instead of just H(text)."""

    __slots__ = ('_wrappedinstance', '_wrappedfactory', 'update')
    def __init__(self, wrappedfactory, *args):
        self._wrappedfactory = wrappedfactory
        self._assign_instance(wrappedfactory(*args))

    def _assign_instance(self, instance):
        "Assign new wrapped instance and set update method."
        self._wrappedinstance = instance
        self.update = instance.update

    def digest(self):
        "return the current digest value"
        return self._wrappedfactory(self._wrappedinstance.digest()).digest()

    def hexdigest(self):
        "return the current digest as a string of hexadecimal digits"
        return self._wrappedfactory(self._wrappedinstance.digest()).hexdigest()

    def copy(self):
        "return a copy of the current hash object"
        new = self.__class__()
        new._assign_instance(self._wrappedinstance.copy())
        return new

    digest_size = property(lambda self: self._wrappedinstance.digest_size,
                           doc="number of bytes in this hashes output")
    digestsize = digest_size
    block_size = property(lambda self: self._wrappedinstance.block_size,
                          doc="internal block size of hash function")

class shad_256(_double_wrapper):
    """
    Double SHA - sha256(sha256(data))
    Less susceptible to length extension attacks than SHA2_256 alone.

    >>> import binascii
    >>> s = shad_256('hello world')
    >>> s.name
    'shad256'
    >>> int(s.digest_size)
    32
    >>> int(s.block_size)
    64
    >>> s.hexdigest()
    'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
    >>> binascii.hexlify(s.digest()) == s.hexdigest()
    True
    >>> s2 = s.copy()
    >>> s2.digest() == s.digest()
    True
    >>> s2.update("text")
    >>> s2.digest() == s.digest()
    False
    """
    __slots__ = ()
    def __init__(self, *args):
        super(shad_256, self).__init__(hashlib.sha256, *args)
    name = property(lambda self: 'shad256', doc='algorithm name')

This is a little verbose, but results in a class that works very nicely from a documentation perspective and has a relatively clear implementation. With Glen's optimization, update is as fast as it possibly can be.

There is one annoyance, which is that the update function shows up as a data member and doesn't have a docstring. I think that's a readability/efficiency tradeoff that's acceptable.


from hashlib import sha256

class shad_256(object):
    def __init__(self, data=''):
        self._hash = sha256(data)

    def __getattr__(self, attr):
        setattr(self, attr, getattr(self._hash, attr))
        return getattr(self, attr)

    def copy(self):
        ret = shad_256()
        ret._hash = self._hash.copy()
        return ret

    def digest(self):
        return sha256(self._hash.digest()).digest()

    def hexdigest(self):
        return sha256(self._hash.digest()).hexdigest()

Any attributes that are not found on an instance are bound lazily by __getattr__. copy() needs to be treated specially of course.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜