Deleting repeats in a list python [duplicate]
Possible Duplicates:
How do you remove duplicates from a list in Python whilst preserving order? In Python, what is the fastest algor开发者_C百科ithm for removing duplicates from a list so that all elements are unique while preserving order?
I was wondering if there was a function which does the following:
Take a list as an argument:
list = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
and deletes all the repeats in the list to obtain:
list = [ 3 , 5 , 6 , 4 , 2 , 7 ]
I know you can convert it into a dictionary and use the fact that dictionaries cannot have repeats but I was wondering if there was a better way of doing it.
Thanks
Please see the Python documentation for three ways to accomplish this. The following is copied from that site. Replace the example 'mylist' with your variable name ('list').
First Example: If you don’t mind reordering the list, sort it and then scan from the end of the list, deleting duplicates as you go:
if mylist:
mylist.sort()
last = mylist[-1]
for i in range(len(mylist)-2, -1, -1):
if last == mylist[i]:
del mylist[i]
else:
last = mylist[i]
Second Example: If all elements of the list may be used as dictionary keys (i.e. they are all hashable) this is often faster:
d = {}
for x in mylist:
d[x] = 1
mylist = list(d.keys())
Third Example: In Python 2.5 and later:
mylist = list(set(mylist))
Even though you said you don't necessarily want to use a dict
, I think an OrderedDict
is a clean solution here.
from collections import OrderedDict
l = [3 ,5 ,6 ,4 ,6 ,2 ,7 ,6 ,5 ,3]
OrderedDict.fromkeys(l).keys()
# [3, 5, 6, 4, 2, 7]
Note that this preserves the original order.
list(set(l))
will not preserve the order. If you want to keep the order then do:
s = set()
result = []
for item in l:
if item not in s:
s.add(item)
result.append(item)
print result
This will run in O(n), where n is the length of the original list.
list(set(list))
works just fine.
First, don't name it list as that shadows the built-in type list. Say, my_list
To solve your problem, the way I've seen most often is list(set(my_list))
set is an unordered container that only has unique elements, and gives (i think) O(1) insertion and checking for membership
As of writing this answer, the only solutions which preserve order are the OrderedDict solution, and Dave's slightly-more-verbose solution.
Here's another way where we abuse side-effects while iterating, which is also more verbose than the OrderedDict solution:
def uniques(iterable):
seen = set()
sideeffect = lambda _: True
return [x for x in iterable
if (not x in seen) and sideeffect(seen.add(x))]
A set would be a better approach than a dictionary terms of O complexity. But both approaches make you loose the ordering (unless you use an ordered dictionary, what augments the complexity again).
As other posters already said, the set solution is not that hard:
l = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
list(set(l))
A way to keep the ordering is:
def uniques(l):
seen = set()
for i in l:
if i not in seen:
seen.add(i)
yield i
Or, in a less readable way:
def uniques(l):
seen = set()
return (seen.add(i) or i for i in l if i not in seen)
You can then use it like this:
l = [ 3 , 5 , 6 , 4 , 6 , 2 , 7 , 6 , 5 , 3 ]
list(uniques(l))
>>> [3, 5, 6, 4, 2, 7]
Here is a snippet from my own collection of handy Python tools - this uses the "abusive side-effect" method that ninjagecko has in his answer. This also takes pains to handle non-hashable values, and to return a sequence of the same type as was passed in:
def unique(seq, keepstr=True):
"""Function to keep only the unique values supplied in a given
sequence, preserving original order."""
# determine what type of return sequence to construct
if isinstance(seq, (list,tuple)):
returnType = type(seq)
elif isinstance(seq, basestring):
returnType = (list, type(seq)('').join)[bool(keepstr)]
else:
# - generators and their ilk should just return a list
returnType = list
try:
seen = set()
return returnType(item for item in seq if not (item in seen or seen.add(item)))
except TypeError:
# sequence items are not of a hashable type, can't use a set for uniqueness
seen = []
return returnType(item for item in seq if not (item in seen or seen.append(item)))
Here are a variety of calls, with sequences/iterators/generators of various types:
from itertools import chain
print unique("ABC")
print unique(list("ABABBAC"))
print unique(range(10))
print unique(chain(reversed(range(5)), range(7)))
print unique(chain(reversed(xrange(5)), xrange(7)))
print unique(i for i in chain(reversed(xrange(5)), xrange(7)) if i % 2)
Prints:
ABC
['A', 'B', 'C']
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[4, 3, 2, 1, 0, 5, 6]
[4, 3, 2, 1, 0, 5, 6]
[3, 1, 5]
精彩评论