join two lists of dictionaries on a single key
Given n
lists with m
dictionaries as their elements, I would like to produce a new list, with a joined set of dictionaries. Each dictionary is guaranteed to have a key called "index", but could have an arbitrary set of keys beyond that. The non-index keys will never overlap across lists. For example, imagine the following two lists:
l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]
("b"
would never appear in l2
, since it appeared in l1
, and similarly, "c"
would never appear in l1
, since it appeared in l2
开发者_开发技巧)
I would like to produce a joined list:
l3 = [{"index":1, "b":2, "c":4},
{"index":2, "b":3, "c":5},
{"index":3, "green":"eggs"}]
What is the most efficient way to do this in Python?
from collections import defaultdict
l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]
d = defaultdict(dict)
for l in (l1, l2):
for elem in l:
d[elem['index']].update(elem)
l3 = d.values()
# l3 is now:
[{'b': 2, 'c': 4, 'index': 1},
{'b': 3, 'c': 5, 'index': 2},
{'green': 'eggs', 'index': 3}]
EDIT: Since l3
is not guaranteed to be sorted (.values()
returns items in no specific order), you can do as @user560833 suggests:
from operator import itemgetter
...
l3 = sorted(d.values(), key=itemgetter("index"))
In python 3.5 or higher, you can merge dictionaries in a single statement.
So for python 3.5 or higher, a quick solution would be:
from itertools import zip_longest
l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]
print(l3)
#[
# {'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 3, 'green': 'eggs'}
#]
However if the two lists were the same size, you could simply use zip:
l3 = [{**u, **v} for u, v in zip(l1, l2)]
Note: This assumes that the lists are sorted the same way by index
, which is stated by OP to not be the case in general.
In order to generalize for that case, one way is to create a custom zip-longest type function which yields values from the two lists only if they match on a key.
For instance:
def sortedZipLongest(l1, l2, key, fillvalue={}):
l1 = iter(sorted(l1, key=lambda x: x[key]))
l2 = iter(sorted(l2, key=lambda x: x[key]))
u = next(l1, None)
v = next(l2, None)
while (u is not None) or (v is not None):
if u is None:
yield fillvalue, v
v = next(l2, None)
elif v is None:
yield u, fillvalue
u = next(l1, None)
elif u.get(key) == v.get(key):
yield u, v
u = next(l1, None)
v = next(l2, None)
elif u.get(key) < v.get(key):
yield u, fillvalue
u = next(l1, None)
else:
yield fillvalue, v
v = next(l2, None)
Now if you had the following out of order lists:
l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"},
{"index":4, "b": 4}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}, {"index":0, "green": "ham"},
{"index":4, "green": "ham"}]
Using the sortedZipLongest
function instead of itertools.zip_longest
:
l3 = [{**u, **v} for u, v in sortedZipLongest(l1, l2, key="index", fillvalue={})]
print(l3)
#[{'index': 0, 'green': 'ham'},
# {'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 3, 'green': 'eggs'},
# {'index': 4, 'b': 4, 'green': 'ham'}]
Whereas original method would produce the incorrect answer:
l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]
print(l3)
#[{'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 0, 'green': 'ham'},
# {'index': 4, 'b': 4, 'green': 'ham'}]
Here's a one-liner that does this:
[dict(sum([z.items() for z in z2],[])) for z2 in [[x3 for x3 in l1+l2 if x3['index']==key] for key in set([x1['index'] for x1 in l1]+[x2['index'] for x2 in l2])]]
Not quite as elegant as a list-comprehension. I don't think the result is guaranteed to necessarily be sorted the way you want either.
Expanding the one-liner:
[
dict(sum([z.items() for z in z2],[]))
for z2 in [
[
x3 for x3 in l1+l2 if x3['index']==key
] for key in set(
[x1['index'] for x1 in l1]+[x2['index'] for x2 in l2]
)
]
]
The set expression on the 6th line gets all the unique index values from both lists. The list comprehension around that (lines 3-9) creates a list of lists where each inner list is a combined list of dictionaries for that index/key with a particular index value. The outermost list comprehension creates a single list of tuple-pairs for each key and converts it back to a list of dictionaries.
精彩评论