开发者

Naming dict keys for fast lookup in python

I'm going to have 1 small dictionary (between 5 and 20 keys) that will be referenced up to a hundred times or so for one page load in python 2.5.

I'm starting to name the keys which it will be looking up and I was wondering if there is a key naming convention I co开发者_开发百科uld follow to help dict lookup times.


I had to test ;-)

using

  • f1, integer key 1
  • f2 short string, "one"
  • f3 long string "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

as one of the keys into a dictionary of length 4. Iterating 10,000,000 times and measuring the times. I get this result:

<function f1 at 0xb779187c>
f1 3.64
<function f2 at 0xb7791bfc>
f2 3.48
<function f3 at 0xb7791bc4>
f3 3.65

I.e no difference...

My code


There may be sensible names for them that just so happen to produce names whose hashes aren't clashing. However, CPython dicts are already one of the most optimized data structures in the known universe, producing few collisions for most inputs, working well with the hash schemes of other builtin types, resolving clashes very fast, etc. It's extremely unlikely you'll see any benefit at all even if you found something, especially since a hundred lookups aren't really that many.

Take, for example, this timeit benchmark run on my 4 years old desktop machine (sporting a laughablely low-budget dual core CPU with 3.1 GHz):

...>python -mtimeit --setup="d = {chr(i)*100: i for i in range(15)};\
k = chr(7)*100" "d[k]"

1000000 loops, best of 3: 0.222 usec per loop

And those strings are a dozen times larger than everything that's remotely sensible to type out manually as a variable name. Cutting down the length from 100 to 10 leads to 0.0778 microseconds per lookup. Now measure your page's load speed and compare those (alternatively, just ponder how long the work you're actually doing when building the page will take); and take into account caching, framework overhead, and all these things.

Nothing you do in this regard can make a difference performance-wise, period, full stop.


Because the Python string hash function iterates over the chars (at least if this is still applicable), I'd opt for short strings.


To add another aspect:

for very small dictionaries and heavy timing constraints, the time to compute hashes might be a substancial fraction of the overall time. Therefore, for (say) 5 elements, it might be faster to use an array and a sequential search (of course, wrapped up into some MiniDictionary object), maybe even augmented by a binary search. This might find the element with 2-3 comparisons, which may or may not be faster than hash-computation plus one compare.

The break-even depends on the hash speed, the average number of elements and the number of hash collisions to expect, so some measurements is required, and there is no "one-size-fits-all" answer.


Python dictionaries have a fast path for string keys, so use these (rather than, say, tuples). The hash value of a string is cached in that string, so it's more important that the strings remain the same ones than their actual value; string constants (i.e., strings that appear verbatim in the program and are not the result of a calculation) always remain exactly the same, so as long as you use those, there's no need to worry.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜