What is the difference between dict and collections.defaultdict?
I was checking out Peter Norvig's code on how to write simple spell checkers. At the beginning, he uses this code to insert words into a dictionary.
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
What is the difference between a Python dict and the one that was used here? In addition, what is the lambda
for? I checked the API documentation here and it says that defaultdict is actually derived from dict but how does on开发者_如何学运维e decide which one to use?
The difference is that a defaultdict
will "default" a value if that key has not been set yet. If you didn't use a defaultdict
you'd have to check to see if that key exists, and if it doesn't, set it to what you want.
The lambda is defining a factory for the default value. That function gets called whenever it needs a default value. You could hypothetically have a more complicated default function.
Help on class defaultdict in module collections:
class defaultdict(__builtin__.dict)
| defaultdict(default_factory) --> dict with default factory
|
| The default factory is called without arguments to produce
| a new value when a key is not present, in __getitem__ only.
| A defaultdict compares equal to a dict with the same items.
|
(from help(type(collections.defaultdict()))
)
{}.setdefault
is similar in nature, but takes in a value instead of a factory function. It's used to set the value if it doesn't already exist... which is a bit different, though.
Courtesy :- https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/
Using Normal dict
d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])# This gives Key Error
We can avoid this KeyError by using defaulting in normal dict as well, let see how we can do it
d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d.get('Apple'))
print(d.get('Grapes',0)) # DEFAULTING
Using default dict
from collections import defaultdict
d = defaultdict(int) ## inside parenthesis we say what should be the default value.
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes']) ##→ This gives Will not give error
Using an user defined function to default the value
from collections import defaultdict
def mydefault():
return 0
d = defaultdict(mydefault)
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])
Summary
Defaulting in normal dict is on case to case basis and in defaultdict we can provide default in general manner
Efficiency of using defaulting by defaultdict is two time greater than defaulting with normal dict. You can refer below link to know better on this performance testing https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/
Use a defaultdict if you have some meaningful default value for missing keys and don't want to deal with them explicitly.
The defaultdict constructor takes a function as a parameter and constructs a value using that function.
lambda: 1
is the same as the parameterless function f that does this
def f():
return 1
I forgot the reason the API was designed this way instead of taking a value as a parameter. If I designed the defaultdict interface, it would be slightly more complicated, the missing value creation function would take the missing key as a parameter.
Let's deep dive into Python dictionary and Python defaultdict()
class
Python Dictionaries
Dict is one of the data structures available in Python which allows data to be stored in the form of key-value pairs.
Example:
d = {'a': 2, 'b': 5, 'c': 6}
Problem with Dictionary
Dictionaries work well unless you encounter missing keys. Suppose you are looking for a key-value pair where there is no value in the dictionary - then you might encounter a KeyError
problem. Something like this:
d = {'a': 2, 'b': 5, 'c': 6}
d['z'] # z is not present in dict so it will throw a error
You will see something like this:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
d['z']
KeyError: 'z'
Solution to the above problem
To overcome the above problem we can use different ways:
Using inbuilt functions
setdefault(key\[, default\])
get(key[, default])
defaultdict()
setdefault
If the key
is in the dictionary, return its value. If not, insert a key with a value of default
and return default
. default
defaults to None
:
>>> d = {'a' :2, 'b': 5, 'c': 6}
>>> d.setdefault('z', 0)
0 # returns 0
>>> print(d) # add z to the dictionary
{'a': 2, 'b': 5, 'c': 6, 'z': 0}
get
Return the value for key
if the key is in the dictionary, else default
. If the default is not given, it defaults to None
, so that this method never raises a KeyError
:
>>> d = {'a': 2, 'b': 5, 'c': 6}
>>> d.get('z', 0)
0 # returns 0
>>> print(d) # Doesn't add z to the dictionary unlike setdefault
{'a': 2, 'b': 5, 'c': 6}
The above 2 methods are the solutions to our problem. It never raises KeyError
. Apart from the above 2 methods, Python also has a collections
module that can handle this problem. Let's dig deep into the defaultdict
in the collections module:
defaultdict
defaultdict
can be found in the collections module of Python. You can use it using:
from collections import defaultdict
d = defaultdict(int)
defaultdict
constructor takes default_factory
as an argument that is a callable. This can be for example:
int
: default will be an integer value of0
str
: default will be an empty string""
list
: default will be an empty list[]
Code:
from collections import defaultdict
d = defaultdict(list)
d['a'] # access a missing key and returns an empty list
d['b'] = 1 # add a key-value pair to dict
print(d)
output will be defaultdict(<class 'list'>, {'b': 1, 'a': []})
The defaultdict
works the same as the get()
and setdefault()
methods, so when to use them?
When to use get()
If you specifically need to return a certain key-value pair without KeyError
and also it should not update in the dictionary - then dict.get
is the right choice for you. It returns the default value specified by you but does not modify the dictionary.
When to use setdefault()
If you need to modify the original dictionary with a default key-value pair - then setdefault
is the right choice.
When to use defaultdict
setdefault
method can be achieved using defaultdict
but instead of providing default value every time in setdefault
, we can do it at once in defaultdict
. Also, setdefault
has a choice of providing different default values for the keys. Both have their own advantages depending on the use case.
When it comes to efficiency:
defaultdict
>setdefault()
orget()
defaultdict
is 2 times faster than get()
!
You can check the results here.
精彩评论