开发者

Best / most pythonic way to get an ordered list of unique items

I have one or more unordered sequences of (immutable, hashable) objects with possible duplicates and I want to get a sorted sequence of all those objects without duplicates.

Right now I'm using a set to quickly gather all the elements discarding duplicates, convert it to a list and then sort that:

result = set()
for s in sequences:
    result = result.union(s)
result = list(result)
result.sort()
return result

It works but I wouldn't call it "pretty". 开发者_StackOverflow社区Is there a better way?


This should work:

sorted(set(itertools.chain.from_iterable(sequences)))


I like your code just fine. It is straightforward and easy to understand.

We can shorten it just a little bit by chaining off the list():

result = set()
for s in sequences:
    result = result.union(s)
return sorted(result)

I really have no desire to try to boil it down beyond that, but you could do it with reduce():

result = reduce(lambda s, x: s.union(x), sequences, set())
return sorted(result)

Personally, I think this is harder to understand than the above, but people steeped in functional programming might prefer it.

EDIT: @agf is much better at this reduce() stuff than I am. From the comments below:

return sorted(reduce(set().union, sequences))

I had no idea this would work. If I correctly understand how this works, we are giving reduce() a callable which is really a method function on one instance of a set() (call it x for the sake of discussion, but note that I am not saying that Python will bind the name x with this object). Then reduce() will feed this function the first two iterables from sequences, returning x, the instance whose method function we are using. Then reduce() will repeatedly call the .union() method and ask it to take the union of x and the next iterable from sequences. Since the .union() method is likely smart enough to notice that it is being asked to take the union with its own instance and not bother to do any work, it should be just as fast to call x.union(x, some_iterable) as to just call x.union(some_iterable). Finally, reduce() will return x, and we have the set we want.

This is a bit tricky for my personal taste. I had to think this through to understand it, while the itertools.chain() solution made sense to me right away.

EDIT: @agf made it less tricky:

return sorted(reduce(set.union, sequences, set()))

What this is doing is much simpler to understand! If we call the instance returned by set() by the name of x again (and just like above with the understanding that I am not claiming that Python will bind the name x with this instance); and if we use the name n to refer to each "next" value from sequences; then reduce() will be repeatedly calling set.union(x, n). And of course this is exactly the same thing as x.union(n). IMHO if you want a reduce() solution, this is the best one.

--

If you want it to be fast, ask yourself: is there any way we can apply itertools to this? There is a pretty good way:

from itertools import chain
return sorted(set(chain(*sequences)))

itertools.chain() called with *sequences serves to "flatten" the list of lists into a single iterable. It's a little bit tricky, but only a little bit, and it's a common idiom.

EDIT: As @Jbernardo wrote in the most popular answer, and as @agf observes in comments, itertools.chain() returns an object that has a .from_iterable() method, and the documentation says it evaluates an iterable lazily. The * notation forces building a list, which may consume considerable memory if the iterable is a long sequence. In fact, you could have a never-ending generator, and with itertools.chain().from_iterable() you would be able to pull values from it for as long as you want to run your program, while the * notation would just run out of memory.

As @Jbernardo wrote:

sorted(set(itertools.chain.from_iterable(sequences)))

This is the best answer, and I already upvoted it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜