merging n sorted lists of tuples in python
I have n lists (n<10) of tuples in the format [(ListID, [(index,value),(index, value),...)] and want to sort them by index to get to following outcome
Example Input:
[('A',[(0.12, 'how'),(0.26,'are'),(0.7, 'you'),(0.9,'mike'),(1.9, "I'm fine too")]),
('B',[(1.23, 'fine'),(1.50, 'thanks'),(1.6,'and you')]),
('C',[(2.12,'good'),(2.24,'morning'),(3.13,'guys')])]
Desired Output:
[('A', ( 0.12, 'how')),
('A', ( 0.26, 'are')),
('A', ( 0.7, 'you')),
('A', ( 0.9, 'mike')),
('B',(1.23, 'fine')),
('B',(1.50, 'thanks')),
('B',(1.6,'and you')),
('A', (1.9, "I'm fine too")),
('C',(2.12,'good')),
('C',(2.24,'morning')),
('C',(3.13,'guys'))]
I know the code is ugly, especially those indices item[0][-1][1], but can somebody tell me what am I doing wrong?
content = []
max = 0.0
first = True
Done = False
finished = []
while not Done:
for item in flow:
if len(finished) == 4:
Done = True
break
if开发者_StackOverflow社区 len(item[1]) == 0:
if item[0] not in finished:
finished.append(item[0])
continue
if first == True:
max = item[1][-1][0]
content.append((item[0], item[1].pop()))
first = False
continue
if item[1][-1][0] > max:
max = item[1][-1][0]
content.append((item[0], item[1].pop()))
content = sorted(content, key=itemgetter(1))
first = True
UPDATE: thank you everybody
>>> from operator import itemgetter
>>> import pprint
>>> pprint.pprint(sorted(((i,k) for i,j in INPUT for k in j), key=itemgetter(1)))
[('A', (0.12, 'how')),
('A', (0.26000000000000001, 'are')),
('A', (0.69999999999999996, 'you')),
('A', (0.90000000000000002, 'mike')),
('B', (1.23, 'fine')),
('B', (1.5, 'thanks')),
('B', (1.6000000000000001, 'and you')),
('A', (1.8999999999999999, "I'm fine")),
('C', (2.1200000000000001, 'good')),
('C', (2.2400000000000002, 'morning')),
('C', (3.1299999999999999, 'guys'))]
There are two main things going on here
[(i,k) for i,j in INPUT for k in j]
takes transforms the INPUT to this struture
[('A', (0.12, 'how')),
('A', (0.26, 'are')),
('A', (0.7, 'you')),
('A', (0.9, 'mike')),
('A', (1.9, "I'm fine")),
('B', (1.23, 'fine')),
('B', (1.5, 'thanks')),
('B', (1.6, 'and you')),
('C', (2.12, 'good')),
('C', (2.24, 'morning')),
('C', (3.13, 'guys'))]
and
sorted(L, key=itemgetter(1))
sorts L buy item[1] of each element. This is actually (0.12, 'how'), (0.27, 'are') ... but the normal way python sorts tuples is from left to right, so we don't need to do extra work to strip the word from the tuple
(OK, the sample data makes the problem description much clearer. Answer revised accordingly)
Step 1: clarify your problem description by reverse engineering your current solution.
- There are 4 different data sets labelled A, B, C and D
- These data sets are contained in a series of 2-tuples of the form (ListID, elements)
- Each "elements" entry is itself a list of 2-tuples of the form (index, value)
- An empty elements entry indicates the end of a data set
- The goal is to merge these data sets into a single sorted list of 2-tuples (ListID, (index, value))
Step 2: transform the input data to create individual records of the desired form.
Generators are built for this kind of thing, so it makes sense to define one.
def get_data(flow, num_data_sets=4):
finished = set()
for list_id, elements in flow:
if list_id in finished:
continue
if not elements:
finished.add(list_id)
if len(finished) == num_data_sets:
break
continue
for element in elements:
yield list_id, element
Step 3: use sorted
to produce the desired ordered list
content = sorted(get_data(flow))
Sample usage:
# get_data defined via copy/paste of source code above
# ref_data taken from the revised question
>>> demo_data = [
... ('A', [(1, 2), (3, 4)]),
... ('B', [(7, 8), (9, 10)]),
... ('A', [(0, 0)]),
... ('C', []), # Finish early
... ('C', [('ignored', 'entry')])
... ]
>>> content = sorted(get_data(demo_data))
>>> print '\n'.join(map(str, content))
('A', 0, 0)
('A', 1, 2)
('A', 3, 4)
('B', 7, 8)
('B', 9, 10)
>>> content = sorted(get_data(ref_data), key=itemgetter(1))
>>> print '\n'.join(map(str, content))
('A', 0.12, 'how')
('A', 0.26, 'are')
('A', 0.7, 'you')
('A', 0.9, 'mike')
('B', 1.23, 'fine')
('B', 1.5, 'thanks')
('B', 1.6, 'and you')
('A', 1.9, "I'm fine too")
('C', 2.12, 'good')
('C', 2.24, 'morning')
('C', 3.13, 'guys')
Your solution ends up being messy and hard to read for two main reasons:
- Failing to use a generator means you aren't gaining the full benefit of the builtin sorted function
- By using indexing instead of tuple unpacking you make it very hard to keep track of what is what
data = [(x,id) for (id, xs) in data for x in xs]
data.sort()
for xs,id in data:
print id,xs
A (0.12, 'how')
A (0.26000000000000001, 'are')
A (0.69999999999999996, 'you')
A (0.90000000000000002, 'mike')
B (1.23, 'fine')
B (1.5, 'thanks')
B (1.6000000000000001, 'and you')
A (1.8999999999999999, "I'm fine too")
C (2.1200000000000001, 'good')
C (2.2400000000000002, 'morning')
C (3.1299999999999999, 'guys')
Your input:
l = [('A',
[(0.12, 'how'),
(0.26000000000000001, 'are'),
(0.69999999999999996, 'you'),
(0.90000000000000002, 'mike'),
(1.8999999999999999, "I'm fine too")]),
('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
('C',
[(2.1200000000000001, 'good'),
(2.2400000000000002, 'morning'),
(3.1299999999999999, 'guys')])]
Convert (and print):
newlist = []
for alpha, tuplelist in l:
for tup in tuplelist:
newlist.append((alpha,tup))
from operator import itemgetter
sorted(newlist,key=itemgetter(1))
print newlist
Check!
[('A', (0.12, 'how')),
('A', (0.26000000000000001, 'are')),
('A', (0.69999999999999996, 'you')),
('A', (0.90000000000000002, 'mike')),
('B', (1.23, 'fine')),
('B', (1.5, 'thanks')),
('B', (1.6000000000000001, 'and you')),
('A', (1.8999999999999999, "I'm fine too")),
('C', (2.1200000000000001, 'good')),
('C', (2.2400000000000002, 'morning')),
('C', (3.1299999999999999, 'guys'))]
You can of course do it within the list comprehension, but you still use 2 for
loops and 1 inbuilt sorted
function. Might as well make it verbose and readable then.
精彩评论