Add new keys to a dictionary while incrementing existing values

2023-01-22 20:19 问答作者：

I am processing a CSV file and count开发者_如何学JAVAing the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses "DefaultDictionary". For example (where x[3] is the value from the file and "a" is a dictionary):

First way:

if x[3] in a:
    a[x[3]] += 1
else:
    a[x[3]] = 1

Second way:

try:
    b[x[3]] += 1
except KeyError:
    b[x[3]] = 1

Third way:

from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1

My question is: which way is more efficient... cleaner... better... etc. Or is there a better way. Both ways work and give the same answer, but I thought I would tap the hive mind as a learning case.

Thanks -

Use collections.Counter. Counter is syntactic sugar for defaultdict(int), but what's cool about it is that it accepts an iterable in the constructor, thus saving an extra step (I assume all of your examples above are wrapped in a for-loop.)

from collections import Counter
count = Counter(x[3] for x in my_csv_reader)

Prior to the introduction of collections.Counter, collections.defaultdict was the most idiomatic for this task, so for users < 2.7, use defaultdict.

from collections import defaultdict
count = defaultdict(int)
for x in my_csv_reader:
    count[x[3]] += 1

You asked which was more efficient. Assuming that you are talking about execution speed: If your data is small, it doesn't matter. If it is large and typical, the "already exists" case will happen much more often than the "not in dict" case. This observation explains some of the results.

Below is some code which can be used with the timeit module to explore speed without file-reading overhead. I have taken the liberty of adding a 5th method, which is not uncompetetive and will run on any Python from at least 1.5.2 [tested] onwards.

from collections import defaultdict, Counter

def tally0(iterable):
    # DOESN'T WORK -- common base case for timing
    d = {}
    for item in iterable:
        d[item] = 1
    return d

def tally1(iterable):
    d = {}
    for item in iterable:
        if item in d:
            d[item] += 1
        else:
            d[item] = 1
    return d

def tally2(iterable):
    d = {}
    for item in iterable:
        try:
            d[item] += 1
        except KeyError:
            d[item] = 1
    return d

def tally3(iterable):
    d = defaultdict(int)
    for item in iterable:
        d[item] += 1

def tally4(iterable):
    d = Counter()
    for item in iterable:
        d[item] += 1

def tally5(iterable):
    d = {}
    dg = d.get
    for item in iterable:
        d[item] = dg(item, 0) + 1
    return d

Typical run (in Windows XP "Command Prompt" window):

prompt>\python27\python -mtimeit -s"t=1000*'now is the winter of our discontent made glorious summer by this son of york';import tally_bench as tb" "tb.tally1(t)"
10 loops, best of 3: 29.5 msec per loop

Here are the results (msec per loop):

0 base case   13.6
1 if k in d   29.5
2 try/except  26.1
3 defaultdict 23.4
4 Counter     79.4
5 d.get(k, 0) 29.2

Another timing trial:

prompt>\python27\python -mtimeit -s"from collections import defaultdict;d=defaultdict(int)" "d[1]+=1"
1000000 loops, best of 3: 0.309 usec per loop

prompt>\python27\python -mtimeit -s"from collections import Counter;d=Counter()" "d[1]+=1"
1000000 loops, best of 3: 1.02 usec per loop

The speed of Counter is possibly due to it being implemented partly in Python code whereas defaultdict is entirely in C (in 2.7, at least).

Note that Counter() is NOT just "syntactic sugar" for defaultdict(int) -- it implements a full bag aka multiset object -- see the docs for details; they may save you from reinventing the wheel if you need some fancy post-processing. If all you want to do is count things, use defaultdict.

Update in response to a question from @Steven Rumbalski: """ I'm curious, what happens if you move the iterable into the Counter constructor: d = Counter(iterable)? (I have python 2.6 and cannot test it.) """

tally6: just does d = Count(iterable); return d, takes 60.0 msecs

You could look at the source (collections.py in the SVN repository) ... here's what my Python27\Lib\collections.py does when iterable is not a Mapping instance:

            self_get = self.get
            for elem in iterable:
                self[elem] = self_get(elem, 0) + 1

Seen that code anywhere before? There's a whole lot of carry-on just to call code that's runnable in Python 1.5.2 :-O

from collections import Counter
Counter(a)

Since you don't have access to Counter, your best bet is your third approach. It's much cleaner and easier to read. In addition, it doesn't have the perpetual testing (and branching) that the first two approaches have, which makes it more efficient.

Use setdefault.

a[x[3]] = a.setdefault(x[3], 0) + 1

setdefault gets the value of the specified key (x[3] in this case), or if it does not exist, the specified value (0 in this case).

继续阅读：python unique-key

Add new keys to a dictionary while incrementing existing values

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？