开发者

Remove items from a list while iterating without using extra memory in Python

My problem is simple: I have a long list of elements that I want to iterate through and check every element against a condition. Depending on the outcome of the condition I would like to delete the current element of the list, and continue iterating over it as usual.

I have read a few other threads on this matter. Two solutions seam to be proposed. Either make a dictionary out of the list (which implies making a copy of all the data that is already filling all the RAM in my case). Either walk the list in reverse (which breaks the concept of the alogrithm I want to implement).

Is there any better or more elegant way than this to do it ?

def walk_list(list_of_g):
    g_index = 0
    while g_in开发者_运维知识库dex < len(list_of_g):
        g_current = list_of_g[g_index]
        if subtle_condition(g_current):
            list_of_g.pop(g_index)
        else:
            g_index = g_index + 1


li = [ x for x in li if condition(x)]

and also

li = filter(condition,li) 

Thanks to Dave Kirby


Here is an alternative answer for if you absolutely have to remove the items from the original list, and you do not have enough memory to make a copy - move the items down the list yourself:

def walk_list(list_of_g):
    to_idx = 0
    for g_current in list_of_g:
        if not subtle_condition(g_current):
            list_of_g[to_idx] = g_current
            to_idx += 1
    del list_of_g[to_idx:]

This will move each item (actually a pointer to each item) exactly once, so will be O(N). The del statement at the end of the function will remove any unwanted items at the end of the list, and I think Python is intelligent enough to resize the list without allocating memory for a new copy of the list.


removing items from a list is expensive, since python has to copy all the items above g_index down one place. If the number of items you want to remove is proportional to the length of the list N, then your algorithm is going to be O(N**2). If the list is long enough to fill your RAM then you will be waiting a very long time for it to complete.

It is more efficient to create a filtered copy of the list, either using a list comprehension as Marcelo showed, or use the filter or itertools.ifilter functions:

g_list = filter(not_subtle_condition, g_list)

If you do not need to use the new list and only want to iterate over it once, then it is better to use ifilter since that will not create a second list:

for g_current in itertools.ifilter(not_subtle_condtion, g_list):
    # do stuff with g_current


The built-in filter function is made just to do this:

list_of_g = filter(lambda x: not subtle_condition(x), list_of_g)


How about this?

[x for x in list_of_g if not subtle_condition(x)]

its return the new list with exception from subtle_condition


For simplicity, use a list comprehension:

def walk_list(list_of_g):
    return [g for g in list_of_g if not subtle_condition(g)]

Of course, this doesn't alter the original list, so the calling code would have to be different.

If you really want to mutate the list (rarely the best choice), walking backwards is simpler:

def walk_list(list_of_g):
    for i in xrange(len(list_of_g), -1, -1):
        if subtle_condition(list_of_g[i]):
            del list_of_g[i]


Sounds like a really good use case for the filter function.

def should_be_removed(element):
  return element > 5

a = range(10)
a = filter(should_be_removed, a)

This, however, will not delete the list while iterating (nor I recommend it). If for memory-space (or other performance reasons) you really need it, you can do the following:

i = 0
while i < len(a):
    if should_be_removed(a[i]):
        a.remove(a[i])
    else:
        i+=1
    print a


If you perform a reverse iteration, you can remove elements on the fly without affecting the next indices you'll visit:

numbers = range(20)

# remove all numbers that are multiples of 3
l = len(numbers)
for i, n in enumerate(reversed(numbers)):
    if n % 3 == 0:
        del numbers[l - i - 1]

print numbers

The enumerate(reversed(numbers)) is just a stylistic choice. You may use a range if that's more legible to you:

l = len(numbers)
for i in range(l-1, -1, -1):
    n = numbers[i]
    if n % 3 == 0:
        del numbers[i]

If you need to travel the list in order, you can reverse it in place with .reverse() before and after the reversed iteration. This won't duplicate your list either.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜