merging n sorted lists of tuples in python

2023-02-11 14:02 问答作者：

I have n lists (n<10) of tuples in the format [(ListID, [(index,value),(index, value),...)] and want to sort them by index to get to following outcome

Example Input:
[('A',[(0.12, 'how'),(0.26,'are'),(0.7, 'you'),(0.9,'mike'),(1.9, "I'm fine too")]),
('B',[(1.23, 'fine'),(1.50, 'thanks'),(1.6,'and you')]),
('C',[(2.12,'good'),(2.24,'morning'),(3.13,'guys')])]

Desired Output:
[('A', ( 0.12, 'how')),
('A', ( 0.26, 'are')),
('A', ( 0.7, 'you')),
('A', ( 0.9, 'mike')),
('B',(1.23, 'fine')),
('B',(1.50, 'thanks')),
('B',(1.6,'and you')),
('A', (1.9, "I'm fine too")),
('C',(2.12,'good')),
('C',(2.24,'morning')),
('C',(3.13,'guys'))]

I know the code is ugly, especially those indices item[0][-1][1], but can somebody tell me what am I doing wrong?

content = []    
max = 0.0
first = True 
Done = False
finished = []
while not Done:
    for item in flow:
        if len(finished) == 4:
            Done = True
            break
        if开发者_StackOverflow社区 len(item[1]) == 0:
            if item[0] not in finished:
                finished.append(item[0])
            continue
        if first == True:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            first = False 
            continue
        if item[1][-1][0] > max:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            content = sorted(content, key=itemgetter(1))    

    first = True

UPDATE: thank you everybody

>>> from operator import itemgetter
>>> import pprint
>>> pprint.pprint(sorted(((i,k) for i,j in INPUT for k in j), key=itemgetter(1)))
[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

There are two main things going on here

[(i,k) for i,j in INPUT for k in j]

takes transforms the INPUT to this struture

[('A', (0.12, 'how')),
 ('A', (0.26, 'are')),
 ('A', (0.7, 'you')),
 ('A', (0.9, 'mike')),
 ('A', (1.9, "I'm fine")),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6, 'and you')),
 ('C', (2.12, 'good')),
 ('C', (2.24, 'morning')),
 ('C', (3.13, 'guys'))]

and

sorted(L, key=itemgetter(1))

sorts L buy item[1] of each element. This is actually (0.12, 'how'), (0.27, 'are') ... but the normal way python sorts tuples is from left to right, so we don't need to do extra work to strip the word from the tuple

(OK, the sample data makes the problem description much clearer. Answer revised accordingly)

Step 1: clarify your problem description by reverse engineering your current solution.

There are 4 different data sets labelled A, B, C and D
These data sets are contained in a series of 2-tuples of the form (ListID, elements)
Each "elements" entry is itself a list of 2-tuples of the form (index, value)
An empty elements entry indicates the end of a data set
The goal is to merge these data sets into a single sorted list of 2-tuples (ListID, (index, value))

Step 2: transform the input data to create individual records of the desired form.

Generators are built for this kind of thing, so it makes sense to define one.

def get_data(flow, num_data_sets=4):
    finished = set()
    for list_id, elements in flow:
        if list_id in finished:
            continue
        if not elements:
            finished.add(list_id)
            if len(finished) == num_data_sets:
                break
            continue
        for element in elements:
            yield list_id, element

Step 3: use sorted to produce the desired ordered list

content = sorted(get_data(flow))

Sample usage:

# get_data defined via copy/paste of source code above
# ref_data taken from the revised question
>>> demo_data = [
...   ('A', [(1, 2), (3, 4)]),
...   ('B', [(7, 8), (9, 10)]),
...   ('A', [(0, 0)]),
...   ('C', []), # Finish early
...   ('C', [('ignored', 'entry')])
... ]
>>> content = sorted(get_data(demo_data))
>>> print '\n'.join(map(str, content))
('A', 0, 0)
('A', 1, 2)
('A', 3, 4)
('B', 7, 8)
('B', 9, 10)
>>> content = sorted(get_data(ref_data), key=itemgetter(1))
>>> print '\n'.join(map(str, content))
('A', 0.12, 'how')
('A', 0.26, 'are')
('A', 0.7, 'you')
('A', 0.9, 'mike')
('B', 1.23, 'fine')
('B', 1.5, 'thanks')
('B', 1.6, 'and you')
('A', 1.9, "I'm fine too")
('C', 2.12, 'good')
('C', 2.24, 'morning')
('C', 3.13, 'guys')

Your solution ends up being messy and hard to read for two main reasons:

Failing to use a generator means you aren't gaining the full benefit of the builtin sorted function
By using indexing instead of tuple unpacking you make it very hard to keep track of what is what

data = [(x,id) for (id, xs) in data for x in xs]
data.sort()
for xs,id in data:
    print id,xs


A (0.12, 'how')
A (0.26000000000000001, 'are')
A (0.69999999999999996, 'you')
A (0.90000000000000002, 'mike')
B (1.23, 'fine')
B (1.5, 'thanks')
B (1.6000000000000001, 'and you')
A (1.8999999999999999, "I'm fine too")
C (2.1200000000000001, 'good')
C (2.2400000000000002, 'morning')
C (3.1299999999999999, 'guys')

Your input:

l = [('A',
    [(0.12, 'how'),
    (0.26000000000000001, 'are'),
    (0.69999999999999996, 'you'),
    (0.90000000000000002, 'mike'),
    (1.8999999999999999, "I'm fine too")]),
    ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
    ('C',
    [(2.1200000000000001, 'good'),
    (2.2400000000000002, 'morning'),
    (3.1299999999999999, 'guys')])]

Convert (and print):

newlist = []
for alpha, tuplelist in l:
    for tup in tuplelist:
        newlist.append((alpha,tup))

from operator import itemgetter
sorted(newlist,key=itemgetter(1))
print newlist

Check!

[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine too")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

You can of course do it within the list comprehension, but you still use 2 for loops and 1 inbuilt sorted function. Might as well make it verbose and readable then.

继续阅读：algorithm merge python sorting tuples

merging n sorted lists of tuples in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？