join two lists of dictionaries on a single key

2023-02-21 06:33 问答作者：

Given n lists with m dictionaries as their elements, I would like to produce a new list, with a joined set of dictionaries. Each dictionary is guaranteed to have a key called "index", but could have an arbitrary set of keys beyond that. The non-index keys will never overlap across lists. For example, imagine the following two lists:

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]

("b" would never appear in l2, since it appeared in l1, and similarly, "c" would never appear in l1, since it appeared in l2开发者_开发技巧)

I would like to produce a joined list:

l3 = [{"index":1, "b":2, "c":4}, 
      {"index":2, "b":3, "c":5}, 
      {"index":3, "green":"eggs"}]

What is the most efficient way to do this in Python?

from collections import defaultdict

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}]

d = defaultdict(dict)
for l in (l1, l2):
    for elem in l:
        d[elem['index']].update(elem)
l3 = d.values()

# l3 is now:

[{'b': 2, 'c': 4, 'index': 1},
 {'b': 3, 'c': 5, 'index': 2},
 {'green': 'eggs', 'index': 3}]

EDIT: Since l3 is not guaranteed to be sorted (.values() returns items in no specific order), you can do as @user560833 suggests:

from operator import itemgetter

...

l3 = sorted(d.values(), key=itemgetter("index"))

In python 3.5 or higher, you can merge dictionaries in a single statement.

So for python 3.5 or higher, a quick solution would be:

from itertools import zip_longest

l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]

print(l3)
#[
#    {'index': 1, 'b': 2, 'c': 4}, 
#    {'index': 2, 'b': 3, 'c': 5}, 
#    {'index': 3, 'green': 'eggs'}
#]

However if the two lists were the same size, you could simply use zip:

l3 = [{**u, **v} for u, v in zip(l1, l2)]

Note: This assumes that the lists are sorted the same way by index, which is stated by OP to not be the case in general.

In order to generalize for that case, one way is to create a custom zip-longest type function which yields values from the two lists only if they match on a key.

For instance:

def sortedZipLongest(l1, l2, key, fillvalue={}):  
    l1 = iter(sorted(l1, key=lambda x: x[key]))
    l2 = iter(sorted(l2, key=lambda x: x[key]))
    u = next(l1, None)
    v = next(l2, None)

    while (u is not None) or (v is not None):  
        if u is None:
            yield fillvalue, v
            v = next(l2, None)
        elif v is None:
            yield u, fillvalue
            u = next(l1, None)
        elif u.get(key) == v.get(key):
            yield u, v
            u = next(l1, None)
            v = next(l2, None)
        elif u.get(key) < v.get(key):
            yield u, fillvalue
            u = next(l1, None)
        else:
            yield fillvalue, v
            v = next(l2, None)

Now if you had the following out of order lists:

l1 = [{"index":1, "b":2}, {"index":2, "b":3}, {"index":3, "green":"eggs"}, 
      {"index":4, "b": 4}]
l2 = [{"index":1, "c":4}, {"index":2, "c":5}, {"index":0, "green": "ham"}, 
      {"index":4, "green": "ham"}]

Using the sortedZipLongest function instead of itertools.zip_longest:

l3 = [{**u, **v} for u, v in sortedZipLongest(l1, l2, key="index", fillvalue={})]
print(l3)
#[{'index': 0, 'green': 'ham'},
# {'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 3, 'green': 'eggs'},
# {'index': 4, 'b': 4, 'green': 'ham'}]

Whereas original method would produce the incorrect answer:

l3 = [{**u, **v} for u, v in zip_longest(l1, l2, fillvalue={})]
print(l3)
#[{'index': 1, 'b': 2, 'c': 4},
# {'index': 2, 'b': 3, 'c': 5},
# {'index': 0, 'green': 'ham'},
# {'index': 4, 'b': 4, 'green': 'ham'}]

Here's a one-liner that does this:

[dict(sum([z.items() for z in z2],[])) for z2 in [[x3 for x3 in l1+l2 if x3['index']==key] for key in set([x1['index'] for x1 in l1]+[x2['index'] for x2 in l2])]]

Not quite as elegant as a list-comprehension. I don't think the result is guaranteed to necessarily be sorted the way you want either.

Expanding the one-liner:

[
    dict(sum([z.items() for z in z2],[])) 
    for z2 in [
        [
            x3 for x3 in l1+l2 if x3['index']==key
        ] for key in set(
            [x1['index'] for x1 in l1]+[x2['index'] for x2 in l2]
        )
    ]
]

The set expression on the 6th line gets all the unique index values from both lists. The list comprehension around that (lines 3-9) creates a list of lists where each inner list is a combined list of dictionaries for that index/key with a particular index value. The outermost list comprehension creates a single list of tuple-pairs for each key and converts it back to a list of dictionaries.

继续阅读：list python

join two lists of dictionaries on a single key

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？