开发者

Python List: Is this the best way to remove duplicates while preserving order? [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicates:

How do you remove duplicates from a list in Python whilst preserving order?

Algorithm - How to delete duplicate elements in a list efficiently?

I've read a lot of methods for removing duplicates from a python list while preserving the order. All the methods appear to require the creation of a function/sub-routine, which I think is not very computationally efficient. I came up with the following and I would like to know if this is the most computationally efficient method to do so? (My usage for this has to be the most efficient possible due to the need to have fast response time.) Thanks

b=[x for i,x in enumerate(a开发者_StackOverflow) if i==a.index(x)]


a.index(x) itself will be O(n) as the list has to be searched for the value x. The overall runtime is O(n^2).

"Saving" function calls does not make a bad algorithm faster than a good one.

More efficient (O(n)) would probably be:

result = []
seen = set()
for i in a:
    if i not in seen:
        result.append(i)
        seen.add(i)

Have a look at this question: How do you remove duplicates from a list in whilst preserving order?

(the top answer also shows how to do this in a list comprehension manner, which will be more efficient than an explicit loop)


You can easily profile your code yourself using the timeit [docs] module. For example, I put your code in func1 and mine in func2. If I repeat this 1000 times with an array with 1000 elements (no duplicates):

>>> a = range(1000)
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
11.691882133483887
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.3130321502685547

Now with duplicates (only 100 distinct values):

>>> a = [random.randint(0, 99) for _ in range(1000)]
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
2.5020430088043213
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.08332705497741699


lst = [1, 3, 45, 8, 8, 8, 9, 10, 1, 2, 3]
dummySet = set()
[(i, dummySet.add(i))[0] for i in lst if i not in dummySet]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜