How to implement an efficient bidirectional hash table?
Python dict
is a very useful data-structure:
d = {'a': 1, 'b': 2}
d['a'] # get 1
Sometimes you'd also like to index by values.
d[1] # get 'a'
Which is the most efficient way to implement this data-structure? Any official recommend way to do 开发者_如何学Goit?
Here is a class for a bidirectional dict
, inspired by Finding key from value in Python dictionary and modified to allow the following 2) and 3).
Note that :
-
- The inverse directory
bd.inverse
auto-updates itself when the standard dictbd
is modified.
- The inverse directory
-
- The inverse directory
bd.inverse[value]
is always a list ofkey
such thatbd[key] == value
.
- The inverse directory
-
- Unlike the
bidict
module from https://pypi.python.org/pypi/bidict, here we can have 2 keys having same value, this is very important.
- Unlike the
Code:
class bidict(dict):
def __init__(self, *args, **kwargs):
super(bidict, self).__init__(*args, **kwargs)
self.inverse = {}
for key, value in self.items():
self.inverse.setdefault(value, []).append(key)
def __setitem__(self, key, value):
if key in self:
self.inverse[self[key]].remove(key)
super(bidict, self).__setitem__(key, value)
self.inverse.setdefault(value, []).append(key)
def __delitem__(self, key):
self.inverse.setdefault(self[key], []).remove(key)
if self[key] in self.inverse and not self.inverse[self[key]]:
del self.inverse[self[key]]
super(bidict, self).__delitem__(key)
Usage example:
bd = bidict({'a': 1, 'b': 2})
print(bd) # {'a': 1, 'b': 2}
print(bd.inverse) # {1: ['a'], 2: ['b']}
bd['c'] = 1 # Now two keys have the same value (= 1)
print(bd) # {'a': 1, 'c': 1, 'b': 2}
print(bd.inverse) # {1: ['a', 'c'], 2: ['b']}
del bd['c']
print(bd) # {'a': 1, 'b': 2}
print(bd.inverse) # {1: ['a'], 2: ['b']}
del bd['a']
print(bd) # {'b': 2}
print(bd.inverse) # {2: ['b']}
bd['b'] = 3
print(bd) # {'b': 3}
print(bd.inverse) # {2: [], 3: ['b']}
You can use the same dict itself by adding key,value pair in reverse order.
d={'a':1,'b':2} revd=dict([reversed(i) for i in d.items()]) d.update(revd)
A poor man's bidirectional hash table would be to use just two dictionaries (these are highly tuned datastructures already).
There is also a bidict package on the index:
- https://pypi.python.org/pypi/bidict
The source for bidict can be found on github:
- https://github.com/jab/bidict
The below snippet of code implements an invertible (bijective) map:
class BijectionError(Exception):
"""Must set a unique value in a BijectiveMap."""
def __init__(self, value):
self.value = value
msg = 'The value "{}" is already in the mapping.'
super().__init__(msg.format(value))
class BijectiveMap(dict):
"""Invertible map."""
def __init__(self, inverse=None):
if inverse is None:
inverse = self.__class__(inverse=self)
self.inverse = inverse
def __setitem__(self, key, value):
if value in self.inverse:
raise BijectionError(value)
self.inverse._set_item(value, key)
self._set_item(key, value)
def __delitem__(self, key):
self.inverse._del_item(self[key])
self._del_item(key)
def _del_item(self, key):
super().__delitem__(key)
def _set_item(self, key, value):
super().__setitem__(key, value)
The advantage of this implementation is that the inverse
attribute of a BijectiveMap
is again a BijectiveMap
. Therefore you can do things like:
>>> foo = BijectiveMap()
>>> foo['steve'] = 42
>>> foo.inverse
{42: 'steve'}
>>> foo.inverse.inverse
{'steve': 42}
>>> foo.inverse.inverse is foo
True
Something like this, maybe:
import itertools
class BidirDict(dict):
def __init__(self, iterable=(), **kwargs):
self.update(iterable, **kwargs)
def update(self, iterable=(), **kwargs):
if hasattr(iterable, 'iteritems'):
iterable = iterable.iteritems()
for (key, value) in itertools.chain(iterable, kwargs.iteritems()):
self[key] = value
def __setitem__(self, key, value):
if key in self:
del self[key]
if value in self:
del self[value]
dict.__setitem__(self, key, value)
dict.__setitem__(self, value, key)
def __delitem__(self, key):
value = self[key]
dict.__delitem__(self, key)
dict.__delitem__(self, value)
def __repr__(self):
return '%s(%s)' % (type(self).__name__, dict.__repr__(self))
You have to decide what you want to happen if more than one key has a given value; the bidirectionality of a given pair could easily be clobbered by some later pair you inserted. I implemented one possible choice.
Example :
bd = BidirDict({'a': 'myvalue1', 'b': 'myvalue2', 'c': 'myvalue2'})
print bd['myvalue1'] # a
print bd['myvalue2'] # b
First, you have to make sure the key to value mapping is one to one, otherwise, it is not possible to build a bidirectional map.
Second, how large is the dataset? If there is not much data, just use 2 separate maps, and update both of them when updating. Or better, use an existing solution like Bidict, which is just a wrapper of 2 dicts, with updating/deletion built in.
But if the dataset is large, and maintaining 2 dicts is not desirable:
If both key and value are numeric, consider the possibility of using Interpolation to approximate the mapping. If the vast majority of the key-value pairs can be covered by the mapping function (and its
reverse function), then you only need to record the outliers in maps.If most of access is uni-directional (key->value), then it is totally ok to build the reverse map incrementally, to trade time for
space.
Code:
d = {1: "one", 2: "two" }
reverse = {}
def get_key_by_value(v):
if v not in reverse:
for _k, _v in d.items():
if _v == v:
reverse[_v] = _k
break
return reverse[v]
a better way is convert the dictionary to a list of tuples then sort on a specific tuple field
def convert_to_list(dictionary):
list_of_tuples = []
for key, value in dictionary.items():
list_of_tuples.append((key, value))
return list_of_tuples
def sort_list(list_of_tuples, field):
return sorted(list_of_tuples, key=lambda x: x[field])
dictionary = {'a': 9, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
list_of_tuples = convert_to_list(dictionary)
print(sort_list(list_of_tuples, 1))
output
[('b', 2), ('c', 3), ('d', 4), ('e', 5), ('a', 9)]
Unfortunately, the highest rated answer, bidict
does not work.
There are three options:
Subclass dict: You can create a subclass of
dict
, but beware. You need to write custom implementations ofupdate
,pop
,initializer
,setdefault
. Thedict
implementations do not call__setitem__
. This is why the highest rated answer has issues.Inherit from UserDict: This is just like a dict, except all the routines are made to call correctly. It uses a dict under the hood, in an item called
data
. You can read the Python Documentation, or use a simple implementation of a by directional list that works in Python 3. Sorry for not including it verbatim: I'm unsure of its copyright.Inherit from Abstract Base Classes: Inheriting from collections.abc will help you get all the correct protocols and implementations for a new class. This is overkill for a bidirectional dictionary, unless it can also encrypt and cache to a database.
TL;DR -- Use this for your code. Read Trey Hunner's article for details.
精彩评论