开发者

Using non-hashable Python objects as keys in dictionaries

Python doesn't allow non-hashable objects to be used as keys in other dictionaries. As pointed out by Andrey Vlasovskikh, there is a nice workaround for the special case of using non-nested dictionaries as keys:

frozenset(a.items())#Can be put in the dictionary instead

Is there a method of using arbitrary objects as keys in dictionaries?

Example:

How would this be used as a key?

{"a":1, "b":{"c":10}}

It is extremely rare that you will actually have to use something like this in your code. If you think this is the case, consider changing your data model first.

Exact use case

The use case is caching calls to an arbitrary keyword only function. Each key in the dictionary is a string (the name of the argument) and the objects can be quite complicated, consisting of layered dictionaries, lists, tuples, ect.

Related problems

This sub-problem has been split off from the problem here. Solutions here deal with the case where the 开发者_开发知识库dictionaries is not layered.


Based off solution by Chris Lutz again.

import collections

def hashable(obj):
    if isinstance(obj, collections.Hashable):
        items = obj
    elif isinstance(obj, collections.Mapping):
        items = frozenset((k, hashable(v)) for k, v in obj.iteritems())
    elif isinstance(obj, collections.Iterable):
        items = tuple(hashable(item) for item in obj)
    else:
        raise TypeError(type(obj))

    return items


Don't. I agree with Andreys comment on the previous question that is doesn't make sense to have dictionaries as keys, and especially not nested ones. Your data-model is obviously quite complex, and dictionaries are probably not the right answer. You should try some OO instead.


Based off solution by Chris Lutz. Note that this doesn't handle objects that are changed by iteration, such as streams, nor does it handle cycles.

import collections

def make_hashable(obj):
    """WARNING: This function only works on a limited subset of objects
    Make a range of objects hashable. 
    Accepts embedded dictionaries, lists or tuples (including namedtuples)"""
    if isinstance(obj, collections.Hashable):
        #Fine to be hashed without any changes
        return obj
    elif isinstance(obj, collections.Mapping):
        #Convert into a frozenset instead
        items=list(obj.items())
        for i, item in enumerate(items):
                items[i]=make_hashable(item)
        return frozenset(items)
    elif isinstance(obj, collections.Iterable):
        #Convert into a tuple instead
        ret=[type(obj)]
        for i, item in enumerate(obj):
                ret.append(make_hashable(item))
        return tuple(ret)
    #Use the id of the object
    return id(obj)


I agree with Lennart Regebro that you don't. However I often find it useful to cache some function calls, callable object and/or Flyweight objects, since they may use keyword arguments.

But if you really want it, try pickle.dumps (or cPickle if python 2.6) as a quick and dirty hack. It is much faster than any of the answers that uses recursive calls to make items immutable, and strings are hashable.

import pickle
hashable_str = pickle.dumps(unhashable_object)


If you really must, make your objects hashable. Subclass whatever you want to put in as a key, and provide a __hash__ function which returns an unique key to this object.

To illustrate:

>>> ("a",).__hash__()
986073539
>>> {'a': 'b'}.__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable

If your hash is not unique enough you will get collisions. May be slow as well.


I totally disagree with comments & answers saying that this shouldn't be done for data model purity reason.

A dictionary associates an object with another object using the former one as a key. Dictionaries can't be used as keys because they're not hashable. This doesn't make any less meaningful/practical/necessary to map dictionaries to other objects.

As I understand the Python binding system, you can bind any dictionary to a number of variables (or the reverse, depends on your terminology) which means that these variables all know the same unique 'pointer' to that dictionary. Wouldn't it be possible to use that identifier as the hashing key ? If your data model ensures/enforces that you can't have two dictionaries with the same content used as keys then that seems to be a safe technique to me.

I should add that I have no idea whatsoever of how that can/should be done though.

I'm not entirely whether this should be an answer or a comment. Please correct me if needed.


With recursion!

def make_hashable(h):
    items = h.items()
    for item in items:
        if type(items) == dict:
            item = make_hashable(item)
    return frozenset(items)

You can add other type tests for any other mutable types you want to make hashable. It shouldn't be hard.


I encountered this issue when using a decorator that caches the results of previous calls based on call signature. I do not agree with the comments/answers here to the effect of "you should not do this", but I think it is important to recognize the potential for surprising and unexpected behavior when going down this path. My thought is that since instances are both mutable and hashable, and it does not seem practical to change that, there is nothing inherently wrong with creating hashable equivalents of non-hashable types or objects. But of course that is only my opinion.

For anyone who requires Python 2.5 compatibility, the below may be useful. I based it on the earlier answer.

from itertools import imap
tuplemap = lambda f, data: tuple(imap(f, data))
def make_hashable(obj):
  u"Returns a deep, non-destructive conversion of given object to an equivalent hashable object"
  if isinstance(obj, list):
    return tuplemap(make_hashable, iter(obj))
  elif isinstance(obj, dict):
    return frozenset(tuplemap(make_hashable, obj.iteritems()))
  elif hasattr(obj, '__hash__') and callable(obj.__hash__):
    try:
      obj.__hash__()
    except:
      if hasattr(obj, '__iter__') and callable(obj.__iter__):
        return tuplemap(make_hashable, iter(obj))
      else:
        raise NotImplementedError, 'object of type %s cannot be made hashable' % (type(obj),)
    else:
      return obj
  elif hasattr(obj, '__iter__') and callable(obj.__iter__):
    return tuplemap(make_hashable, iter(obj))
  else:
    raise NotImplementedError, 'object of type %s cannot be made hashable' % (type, obj)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜