How to find duplicate elements in array using for loop in Python?

2022-12-14 01:44 问答作者：

I have a list wit开发者_如何学Goh duplicate elements:

 list_a=[1,2,3,5,6,7,5,2]

 tmp=[]

 for i in list_a:
     if tmp.__contains__(i):
         print i
     else:
         tmp.append(i)

I have used the above code to find the duplicate elements in the list_a. I don't want to remove the elements from list.

But I want to use for loop here. Normally C/C++ we use like this I guess:

 for (int i=0;i<=list_a.length;i++)
     for (int j=i+1;j<=list_a.length;j++)
         if (list_a[i]==list_a[j])
             print list_a[i]

how do we use like this in Python?

for i in list_a:
    for j in list_a[1:]:
    ....

I tried the above code. But it gets solution wrong. I don't know how to increase the value for j.

Just for information, In python 2.7+, we can use Counter

import collections

x=[1, 2, 3, 5, 6, 7, 5, 2]

>>> x
[1, 2, 3, 5, 6, 7, 5, 2]

>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})

Unique List

>>> list(y)
[1, 2, 3, 5, 6, 7]

Items found more than 1 time

>>> [i for i in y if y[i]>1]
[2, 5]

Items found only one time

>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]

Use the in operator instead of calling __contains__ directly.

What you have almost works (but is O(n**2)):

for i in xrange(len(list_a)):
  for j in xrange(i + 1, len(list_a)):
    if list_a[i] == list_a[j]:
      print "duplicate:", list_a[i]

But it's far easier to use a set (roughly O(n) due to the hash table):

seen = set()
for n in list_a:
  if n in seen:
    print "duplicate:", n
  else:
    seen.add(n)

Or a dict, if you want to track locations of duplicates (also O(n)):

import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
  items[item].append(i)
for item, locs in items.iteritems():
  if len(locs) > 1:
    print "duplicates of", item, "at", locs

Or even just detect a duplicate somewhere (also O(n)):

if len(set(list_a)) != len(list_a):
  print "duplicate"

You could always use a list comprehension:

dups = [x for x in list_a if list_a.count(x) > 1]

Before Python 2.3, use dict() :

>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
...     stats[x] = stats.get(x, 0) + 1 
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1] 
>>> print duplicates

So a function :

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    stats = {}
    for x in iterable : 
        stats[x] = stats.get(x, 0) + 1
    return (dup for (dup, i) in stats.items() if i > 1)

With Python 2.3 comes set(), and it's even a built-in after than :

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    try: # try using built-in set
        found = set() 
    except NameError: # fallback on the sets module
        from sets import Set
        found = Set()

    for x in iterable:
        if x in found : # set is a collection that can't contain duplicate
            yield x
        found.add(x) # duplicate won't be added anyway

With Python 2.7 and above, you have the collections module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :

import collections

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)

I'd stick with solution 2.

You can use this function to find duplicates:

def get_duplicates(arr):
    dup_arr = arr[:]
    for i in set(arr):
        dup_arr.remove(i)       
    return list(set(dup_arr))

Examples

print get_duplicates([1,2,3,5,6,7,5,2])

[2, 5]

print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])

[1, 2, 4]

If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:

n = len(list_a)
for i in range(n):
    for j in range(i+1, n):
        if list_a[i] == list_a[j]:
            print list_a[i]

The code above is not "Pythonic". I would do it something like this:

seen = set()
for i in list_a:
   if i in seen:
       print i
   else:
       seen.add(i)

Also, don't use __contains__, rather, use in (as above).

The following requires the elements of your list to be hashable (not just implementing __eq__ ). I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):

import collections
l = [1, 2, 4, 1, 3, 3]
d = collections.defaultdict(int)
for x in l:
   d[x] += 1
print [k for k, v in d.iteritems() if v > 1]
# prints [1, 3]

Using only itertools, and works fine on Python 2.5

from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])

Result:

{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}

It looks like you have a list (list_a) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp based on list_a. In Python 2.7, you can accomplish this with one line:

tmp = list(set(list_a))

Comparing the lengths of tmp and list_a at this point should clarify if there were indeed duplicate items in list_a. This may help simplify things if you want to go into the loop for additional processing.

You could just "translate" it line by line.

c++

for (int i=0;i<=list_a.length;i++)
    for (int j=i+1;j<=list_a.length;j++)
        if (list_a[i]==list_a[j])
            print list_a[i]

Python

for i in range(0, len(list_a)):
    for j in range(i + 1, len(list_a))
        if list_a[i] == list_a[j]:
            print list_a[i]

c++ for loop:

for(int x = start; x < end; ++x)

Python equivalent:

for x in range(start, end):

Just quick and dirty,

list_a=[1,2,3,5,6,7,5,2] 
holding_list=[]

for x in list_a:
    if x in holding_list:
        pass
    else:
        holding_list.append(x)

print holding_list

Output [1, 2, 3, 5, 6, 7]

Using numpy:

import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])

In case of Python3 and if you two lists

def removedup(List1,List2):
    List1_copy = List1[:]
        for i in List1_copy:
            if i in List2:
                List1.remove(i)

List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)

Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:

 pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()

You can use:

b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
    value=0
    for j in b:
        if(i == j):
            value+=1
            c[i]=value
print(c)

Output:

{'E': 3, 'P': 2, 'O': 1}

Find duplicates in the list using loops, conditional logic, logical operators, and list methods

some_list = ['a','b','c','d','e','b','n','n','c','c','h',]

duplicates = [] 

for values in some_list:

    if some_list.count(values) > 1:

        if values not in duplicates:

            duplicates.append(values)

print("Duplicate Values are : ",duplicates)

Finding the number of repeating elements in a list:

myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
    count = 0
    for i in myList:
        if s == i :
            count += 1
    print(f'total {s} => {count}')

Try like this:

list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []

for i in list_a:
    if i not in unique_values:
        unique_values.append(i)
    else:
        found = False
        for x in duplicates:
            if x.get("key") == i:
                found = True
        if found:
            x["occurrence"] += 1
        else:
            duplicates.append({
                "key": i,
                "occurrence": 1
            })

some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
    if i not in count:
        count[i]=1
    else:
        count[i]+=1
        dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)

A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:

for i, elem in enumerate(seq):
    if elem in seq[i+1:]:
        print elem

Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.

继续阅读：duplicates python

How to find duplicate elements in array using for loop in Python?

Examples

Output [1, 2, 3, 5, 6, 7]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Examples

Output [1, 2, 3, 5, 6, 7]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？