Looking for more pythonic list comparison solution

2023-03-22 01:55 问答作者：

Ok so I have two lists:

x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]

I compare them in such a way so I get the following output:

x = [3, 4]
y = [1, 5, 6]

The basic is idea to go through each list and compare them. If they have an element in common remove that element. But only one of t开发者_JAVA技巧hat element not all of them. If they don't have an element in common leave it. Two identical lists would become x = [], y = []

Here is my rather hacked up and pretty lame solution. I hoping other can recommend a better and / or more pythonic way of doing this. 3 loops seems excessive...

    done = True

    while not done:
        done = False
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
        print xlist, ylist

Thanks as always for taking the time to read this question. XOXO

It's possible that the data structure you are looking for is the multiset (or "bag"), and if so, a good way to implement it in Python is to use collections.Counter:

>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})

If you want to convert the Counter objects back to lists with multiplicity, you can use the elements method:

>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]

If you don't care about order, use collections.Counter to do it in one line:

>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})

>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})

If you care about order, you can probably iterate through your lists grabbing elements from the above dictionaries:

def prune(seq, toPrune):
    """Prunes elements from front of seq in O(N) time"""
    remainder = Counter(seq)-Counter(toPrune)
    R = []
    for x in reversed(seq):
        if remainder.get(x):
            remainder[x] -= 1
            R.insert(0,x)
    return R

Demo:

>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]

To build on Gareth's answer:

>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]

Benchmark code:

from collections import Counter
from collections import defaultdict
import random

# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]

# long lists
a = []
b = []

for i in range(0, 1000):
    q = random.choice((1, 2, 3, 4))
    if q == 1:
        a.append(i)
    elif q == 2:
        b.append(i)
    elif q == 3:
        a.append(i)
        b.append(i)
    else:
        a.append(i)
        b.append(i)
        b.append(i)

# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
    done = False

    while not done:
        done = True
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
    return xlist, ylist # not strictly necessary, see above


def counter(xlist, ylist):
    x = Counter(xlist)
    y = Counter(ylist)
    return ((x-y).elements(), (y-x).elements())


def nasty(xlist, ylist):
    x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
    y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])

    return x, y


def gnibbler(xlist, ylist):
    d = defaultdict(int)
    for i in xlist: d[i] += 1
    for i in ylist: d[i] -= 1
    return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]

# substitute algorithm to test in the call
for x in range(0, 100000):
    original(list(a), list(b))

Running the Insufficiently Rigorous Benchmarks[tm] (short lists are the original ones, long lists are randomly generated lists approximately 1000 entries long with a mix of matches and repeats, times given in multipliers of the Original algorithm):

    100K iterations, short lists    1K iterations, long lists
Original     1.0                           1.0
Counter      9.3                           0.06
Nasty        2.9                           1.4
Gnibbler     2.4                           0.02

Note 1: The creation of the Counter object seems to overshadow the actual algorithm at small list sizes.

Note 2: Original and gnibbler are the same at list lengths of approximately 35, above which gnibbler (and Counter) are faster.

Just using collections.defaultdict so will work on Python2.5+

>>> x = [1, 2, 3, 4]
>>> y = [1, 1, 2, 5, 6]
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in x:
...  d[i] += 1
... 
>>> for i in y:
...  d[i] -= 1
... 
>>> [k for k,v in d.items() for i in range(v)]
[3, 4]
>>> [k for k,v in d.items() for i in range(-v)]
[1, 5, 6]

I find this is better than range (or xrange) if the number repetitions get large

>>> from itertools import repeat
>>> [k for k,v in d.items() for i in repeat(None, v)]

Quite nasty :P

a = sum(([i]*(x.count(i)-y.count(i)) for i in set(x)),[])
b = sum(([i]*(y.count(i)-x.count(i)) for i in set(y)),[])

x,y = a,b

This is simple if you dont care about the duplicates:

>>> x=[1,2,3,4]
>>> y=[1,1,2,5,6]
>>> list(set(x).difference(set(y)))
[3, 4]
>>> list(set(y).difference(set(x)))
[5, 6]

继续阅读：list python

Looking for more pythonic list comparison solution

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？