Efficient multiple, arbitrary index access in Python tuple?

2023-03-31 16:05 问答作者：

I have a long Python tuple t. I would like to grab the elements at indices i1, i2, ..., iN from t as efficiently as possible. What's the best way?

One approach is:

(1)    result = [t[j] for j in (i1, i2, ..., iN)]

but this would seem to cause N separate lookups into the tuple. Is there a faster way? When Python does slices like this:

(2)    result = t[1:M:3]

I assume that it does not p开发者_开发百科erform M/3 separate lookups. (Maybe it uses a bitmask and does a single copy operation?) Is there some way for me to capitalize on whatever Python does in (2) to make my arbitrary-index slice happen in a single copy?

Thanks.

If you are doing a bunch of identical lookups, it may be worth using an itemgetter

from operator import itemgetter
mygetter = itemgetter(i1, i2, ..., iN)
for tup in lots_of_tuples:
    result = mygetter(tup)

For one off, the overhead of creating the itemgetter is not worthwhile

Quick test in iPython shows:

In [1]: import random

In [2]: from operator import itemgetter

In [3]: t=tuple(range(1000))

In [4]: idxs = tuple(random.randrange(1000) for i in range(20))

In [5]: timeit [t[i] for i in idxs]
100000 loops, best of 3: 2.09 us per loop

In [6]: mygetter = itemgetter(*idxs)

In [7]: timeit mygetter(t)
1000000 loops, best of 3: 596 ns per loop

Obviously the difference will depend on the length of the tuple, the number of indices, etc.

The one you've listed is the most optimal way to get the elements from a tuple. You usually don't care about the performance in such expressions – it's a premature optimisation, and even if you did, such operations are already too slow even with the optimisations, i.e. if you optimise the access the loop itself will still be slow due to reference counting of the temporary variables and etc.

If you already have a performance issue or this is already part of CPU-heavy code you can try several alternatives:

1) numpy arrays:

>>> arr = np.array(xrange(2000))
>>> mask = np.array([True]*2000)
>>> mask = np.array([False]*2000)
>>> mask[3] = True
>>> mask[300] = True
>>> arr[mask]
array([  3, 300])

2) You can use the C API to copy the elements using PyTuple_GET_ITEM which accesses the internal array directly, but be warned that using the C API is not trivial and can introduce a lot of bugs.

3) You can use C arrays with the C API, using e.g. the buffer interface of array.array to glue the data access to Python.

4) You can use Cython with C arrays and a custom Cython type for data access from Python.

5) You can use Cython and numpy together.

Inside the list comprehension there is an implicit for loop, and I am pretty sure it is iterating through the tuple values with reasonable efficiency. I don't think you can improve on the list comprehension for efficiency.

If you just need the values you might be able to use a generator expression and avoid building the list, for a slight savings in time or memory.

Slicing can be more efficient because it has more constraints: the index must proceed in a linear fashion by a fixed amount. The list comprehension could be completely random so no optimization is possible.

Still it's dangerous to make assumptions about efficiency. Try timing both ways and see if there's a significant difference.

1) Are you sure you need the operation to go faster?

2) Another option is operator.itemgetter: It returns a tuple picked by its indexes:

>>> t = tuple(string.ascii_uppercase)
>>> operator.itemgetter(13,19,4,21,1)(t)
('N', 'T', 'E', 'V', 'B')

The operator module is implemented in C, so will likely outperform a Python loop.

继续阅读：python slice tuples

Efficient multiple, arbitrary index access in Python tuple?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？