Naming dict keys for fast lookup in python

2023-03-12 23:17 问答作者：

I'm going to have 1 small dictionary (between 5 and 20 keys) that will be referenced up to a hundred times or so for one page load in python 2.5.

I'm starting to name the keys which it will be looking up and I was wondering if there is a key naming convention I co开发者_开发百科uld follow to help dict lookup times.

I had to test ;-)

using

f1, integer key 1
f2 short string, "one"
f3 long string "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

as one of the keys into a dictionary of length 4. Iterating 10,000,000 times and measuring the times. I get this result:

<function f1 at 0xb779187c>
f1 3.64
<function f2 at 0xb7791bfc>
f2 3.48
<function f3 at 0xb7791bc4>
f3 3.65

I.e no difference...

My code

There may be sensible names for them that just so happen to produce names whose hashes aren't clashing. However, CPython dicts are already one of the most optimized data structures in the known universe, producing few collisions for most inputs, working well with the hash schemes of other builtin types, resolving clashes very fast, etc. It's extremely unlikely you'll see any benefit at all even if you found something, especially since a hundred lookups aren't really that many.

Take, for example, this timeit benchmark run on my 4 years old desktop machine (sporting a laughablely low-budget dual core CPU with 3.1 GHz):

...>python -mtimeit --setup="d = {chr(i)*100: i for i in range(15)};\
k = chr(7)*100" "d[k]"

1000000 loops, best of 3: 0.222 usec per loop

And those strings are a dozen times larger than everything that's remotely sensible to type out manually as a variable name. Cutting down the length from 100 to 10 leads to 0.0778 microseconds per lookup. Now measure your page's load speed and compare those (alternatively, just ponder how long the work you're actually doing when building the page will take); and take into account caching, framework overhead, and all these things.

Nothing you do in this regard can make a difference performance-wise, period, full stop.

Because the Python string hash function iterates over the chars (at least if this is still applicable), I'd opt for short strings.

To add another aspect:

for very small dictionaries and heavy timing constraints, the time to compute hashes might be a substancial fraction of the overall time. Therefore, for (say) 5 elements, it might be faster to use an array and a sequential search (of course, wrapped up into some MiniDictionary object), maybe even augmented by a binary search. This might find the element with 2-3 comparisons, which may or may not be faster than hash-computation plus one compare.

The break-even depends on the hash speed, the average number of elements and the number of hash collisions to expect, so some measurements is required, and there is no "one-size-fits-all" answer.

Python dictionaries have a fast path for string keys, so use these (rather than, say, tuples). The hash value of a string is cached in that string, so it's more important that the strings remain the same ones than their actual value; string constants (i.e., strings that appear verbatim in the program and are not the result of a calculation) always remain exactly the same, so as long as you use those, there's no need to worry.

继续阅读：django performance python

Naming dict keys for fast lookup in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？