开发者

Compare equal length lists to find positions that share the same element

I want to compare a list of lists that have the same length, but differ in their content. My script should return only the positions that share exactly the same element (in all lists). For example:

l = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,4,5,7,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]

and as a result I get a list of positions开发者_StackOverflow p = [3,7] as in all list we have '4' and '8' at positions 3 and 7, respectively.

These elements can be strings as well, I'm just giving an example with integers. Thanks for any help!


l = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,4,5,7,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]

p = [i for i, j in enumerate(zip(*l)) if all(j[0]==k for k in j[1:])]

# p == [3] - because of some typo in your original list, probably too many elements in the second list.

This is just the one-liner (list comprehension) version of this, more verbose:

p = []
for i, j in enumerate(zip(*l)):
    if all(j[0]==k for k in j[1:]):
        p.append(i)

zip(*l) gives you:

[(1, 9, 5, 0),
 (2, 8, 6, 0),
 (3, 8, 7, 1),
 (4, 4, 4, 4),
 (5, 3, 9, 7),
 (6, 4, 9, 6),
 (7, 5, 9, 3),
 (8, 7, 8, 8)]

enumerate() puts numbers 0, 1, 2, ... to each tuple within that list.

all(j[0]==k for k in j[1:]) compares the first element of the tuple with all remaining elements and returns True if all of them are equal, False otherwise (it returns False as soon as it finds a different element, so it's faster)


I liked eumiro solution, but I did with a set

p = [i for i, j in enumerate(zip(*l)) if len(set(j)) == 1]


l = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,4,5,7,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]
r = []

for i in range(len(l[0])):
    e = l[0][i]
    same = True
    for j in range(1, len(l)):
        if e != l[j][i]:
            same = False
            break
    if same:
        r.append(i)

print r

prints only [3], as l[1] does not have 8 at position 7. It have one more element.


li = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,6,5,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]

first = li[0]
r = range(len(first))
for current in li[1:]:
    r = [ i for i in r if current[i]==first[i]]

print [first[i] for i in r]

result

[4, 8]

.

Comparing execution's times:

from time import clock

li = [[1,2,3,4,5,6,7,8,9,10],
      [9,8,8,4,5,6,5,8,9,13],
      [5,6,7,4,9,9,9,8,9,12],
      [0,0,1,4,7,6,3,8,9,5]]

n = 10000

te = clock()
for turn in xrange(n):
    first = li[0]
    r = range(len(first))
    for current in li[1:]:
        r = [ i for i in r if current[i]==first[i]]
    x = [first[i] for i in r]
t1 = clock()-te
print 't1 =',t1
print x


te = clock()
for turn in xrange(n):
    y = [j[0] for i, j in enumerate(zip(*li)) if all(j[0]==k for k in j[1:])] 
t2 = clock()-te
print 't2 =',t2
print y

print 't2/t1 =',t2/t1
print

result

t1 = 0.176347273187
[4, 8, 9]
t2 = 0.579408755442
[4, 8, 9]
t2/t1 = 3.28561221827

.

With

li = [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,2,22,26,24,25],
      [9,8,8,4,5,6,5,8,9,13,18,12,15,14,15,15,4,16,19,20,2,158,35,24,13],
      [5,6,7,4,9,9,9,8,9,12,45,12,4,19,15,20,24,18,19,20,2,58,23,24,25],
      [0,0,1,4,7,6,3,8,9,5,12,12,12,15,15,15,5,3,14,20,9,18,28,24,14]]

result

t1 = 0.343173188632
[4, 8, 9, 12, 15, 20, 24]
t2 = 1.21259110432
[4, 8, 9, 12, 15, 20, 24]
t2/t1 = 3.53346690385
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜