List Manipulation in Python with pop()
In short, I need to remove multiple items from a list according to their indexes. However, I can't use pop because it shifts the indexes (without some clumsy compensating system). Is there a way to remove multiple items simultaneously?
I have an algorithm that goes through a list, and if the conditions are right removes that item via the pop method. A problem arises seeing as this is all done in a loop. Once pop is done the list is shortened by one, displacing all the values by one. So the loop will go out of range. Is it possible to remove multiple items simultaneously, or another solution?
An example of my problem:
L = ['a', 'b', 'c', 'd']
for i in 开发者_如何学Pythonrange(len(L)):
print L
if L[i] == 'a' or L[i] == 'c':
L.pop(i)
Are your lists large? If so, use ifilter
from itertools
to filter out elements that you don't want lazily (with no up front cost).
Lists not so large? Just use a list comprehension:
newlist = [x for x in oldlist if x not in ['a', 'c'] ]
This will create a new copy of the list. This is not generally an issue for efficiency unless you really care about memory consumption.
As a happy medium of syntax convenience and laziness ( = efficiency for large lists), you can construct a generator rather than a list by using (
)
instead of [
]
:
interestingelts = (x for x in oldlist if x not in ['a', 'c'])
After this, you can iterate over interestingelts
, but you can't index into it:
for y in interestingelts: # ok
print y
print interestingelts[0] # not ok: generator allows sequential access only
You want a list comprehension:
L = [c for c in L if c not in ['a', 'c']]
Or, if you really don't want to create a copy, go backwards:
for i in reversed(range(len(L))):
if L[i] in ['a', 'c']:
L.pop(i) # del L[i] is more efficient
Thanks to ncoghlan for reversed()
& phooji for del L[i]
suggestions. (I decided to leave it as L.pop(i)
, since that's how the question was initially formulated.)
Also, as J.S. Sebastian correctly points out, going backwards is space efficient but time inefficient; most of the time a list comprehension or generator (L = (...)
instead of L = [...]
) is best.
Edit:
Ok, so since people seem to want something less ridiculously slow than the reversed method above (I can't imagine why... :) here's an order-preserving, in-place filter that should differ in speed from a list comprehension only by a constant. (This is akin to what I'd do if I wanted to filter a string in c.)
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in ['a', 'c']:
write_i += 1
del L[write_i:]
print L
# output: ['b', 'd']
Summary
- use list comprehension (or genexpr) to remove multiple items from a list
- if your input is a large byte-string then use
str.translate()
to delete characters - deleting one item at a time
del L[i]
is slow for large lists
If items are bytes as in your example you could use str.translate()
:
def remove_bytes(bytestr, delbytes):
"""
>>> remove_bytes(b'abcd', b'ac') == b'bd'
True
"""
return bytestr.translate(None, delbytes)
In general multiple items could be removed using slicing:
def remove_inplace_without_order(L, delitems):
"""Remove all items from `L` that are in `delitems` (not preserving order).
>>> L = list(range(4)); remove_inplace_without_order(L, [0,2]); L
[3, 1]
"""
idel = len(L) # items idel.. to be removed
for i in reversed(range(len(L))):
if L[i] in delitems:
idel -= 1
L[i] = L[idel] # save `idel`-th item
del L[idel:] # remove items all at once
#NOTE: the function returns `None` (it means it modifies `L` inplace)
As @phooji and @senderle already mentioned list comprehension (or generator expression) is preferable in your case:
def remove_listcomp(L, delitems):
return [x for x in L if x not in delitems]
Here's a performance comparison for L=list("abcd"*10**5); delitems="ac"
:
| function | time, msec | ratio |
|------------------------------+------------+--------|
| list | 4.42 | 0.9 |
| remove_bytes | 4.88 | 1.0 |
| remove | 27.3 | 5.6 |
| remove_listcomp | 36.8 | 7.5 |
| remove_inplace_without_order | 71.2 | 14.6 |
| remove_inplace_senderle2 | 83.8 | 17.2 |
| remove_inplace_senderle | 15000 | 3073.8 |
#+TBLFM: $3=$2/@3$2;%.1f
Where
try:
from itertools import ifilterfalse as filterfalse
except ImportError:
from itertools import filterfalse # py3k
def remove(L, delitems):
return filterfalse(delitems.__contains__, L)
def remove_inplace_senderle(L, delitems):
for i in reversed(range(len(L))):
if L[i] in delitems:
del L[i]
def remove_inplace_senderle2(L, delitems):
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in delitems:
write_i += 1
del L[write_i:]
remove_inplace_senderle()
is slow due to it uses O(N**2)
algorithm. Each del L[i]
might cause all items to the right to be moved left to close the gap.
The time column in the above table includes time it takes to create a new input list (the first row) due to some algorithms modify an input inplace.
Here's timings for the same input but without creating a new list on each iteration:
| function | time, msec | ratio |
|-----------------+------------+-------|
| remove_bytes | 0.391 | 1 |
| remove | 24.3 | 62 |
| remove_listcomp | 33.4 | 85 |
#+TBLFM: $3=$2/@2$2;%d
The table shows that itertools.ifilterfalse()
doesn't provide a significant improvement over listcomp.
In general it is not worth it or even harmful to think about performance for such tasks unless a profiler proven that this code is a bottleneck and it is important for your program. But it might be useful to be aware of alternative approaches that could provide more than an order of magnitude improvement in speed.
精彩评论