Why does x,y = zip(*zip(a,b)) work in Python?
OK I love Python's zip()
function. Use it all the time, it's brilliant. Every now and again I want to do the opposite of zip()
, think "I used to know how to do that", then google python unzip, then remember that one uses this magical *
to unzip a zipped list of tuples. Like this:
x = [1,2,3]
y = [4,5,6]
zippe开发者_如何学JAVAd = zip(x,y)
unzipped_x, unzipped_y = zip(*zipped)
unzipped_x
Out[30]: (1, 2, 3)
unzipped_y
Out[31]: (4, 5, 6)
What on earth is going on? What is that magical asterisk doing? Where else can it be applied and what other amazing awesome things in Python are so mysterious and hard to google?
The asterisk in Python is documented in the Python tutorial, under Unpacking Argument Lists.
The asterisk performs apply
(as it's known in Lisp and Scheme). Basically, it takes your list, and calls the function with that list's contents as arguments.
It's also useful for multiple args:
def foo(*args):
print args
foo(1, 2, 3) # (1, 2, 3)
# also legal
t = (1, 2, 3)
foo(*t) # (1, 2, 3)
And, you can use double asterisk for keyword arguments and dictionaries:
def foo(**kwargs):
print kwargs
foo(a=1, b=2) # {'a': 1, 'b': 2}
# also legal
d = {"a": 1, "b": 2}
foo(**d) # {'a': 1, 'b': 2}
And of course, you can combine these:
def foo(*args, **kwargs):
print args, kwargs
foo(1, 2, a=3, b=4) # (1, 2) {'a': 3, 'b': 4}
Pretty neat and useful stuff.
It doesn't always work:
>>> x = []
>>> y = []
>>> zipped = zip(x, y)
>>> unzipped_x, unzipped_y = zip(*zipped)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack
Oops! I think it needs a skull to scare it into working:
>>> unzipped_x, unzipped_y = zip(*zipped) or ([], [])
>>> unzipped_x
[]
>>> unzipped_y
[]
In python3 I think you need
>>> unzipped_x, unzipped_y = tuple(zip(*zipped)) or ([], [])
since zip now returns a generator function which is not False-y.
I'm extremely new to Python so this just recently tripped me up, but it had to do more with how the example was presented and what was emphasized.
What gave me problems with understanding the zip example was the asymmetry in the handling of the zip call return value(s). That is, when zip is called the first time, the return value is assigned to a single variable, thereby creating a list reference (containing the created tuple list). In the second call, it's leveraging Python's ability to automatically unpack a list (or collection?) return value into multiple variable references, each reference being the individual tuple. If someone isn't familiar with how that works in Python, it makes it easier to get lost as to what's actually happening.
>>> x = [1, 2, 3]
>>> y = "abc"
>>> zipped = zip(x, y)
>>> zipped
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> z1, z2, z3 = zip(x, y)
>>> z1
(1, 'a')
>>> z2
(2, 'b')
>>> z3
(3, 'c')
>>> rezipped = zip(*zipped)
>>> rezipped
[(1, 2, 3), ('a', 'b', 'c')]
>>> rezipped2 = zip(z1, z2, z3)
>>> rezipped == rezipped2
True
(x, y) == tuple(zip(*zip(x,y)))
is true if and only if the two following statements are true:
x
andy
have the same lengthx
andy
are tuples
One good way to understand what's going on is to print at each step:
x = [1, 2, 3]
y = ["a", "b", "c", "d"]
print("1) x, y = ", x, y)
print("2) zip(x, y) = ", list(zip(x, y)))
print("3) *zip(x, y) = ", *zip(x, y))
print("4) zip(*zip(x,y)) = ", list(zip(*zip(x,y))))
Which outputs:
1) x, y = [1, 2, 3] ['a', 'b', 'c', 'd']
2) zip(x, y) = [(1, 'a'), (2, 'b'), (3, 'c')]
3) *zip(x, y) = (1, 'a') (2, 'b') (3, 'c')
4) zip(*zip(x,y)) = [(1, 2, 3), ('a', 'b', 'c')]
Basically this is what happens:
- Items from
x
andy
are paired according to their respective indexes. - Pairs are unpacked to 3 different objects (tuples)
- Pairs are passed to zip, which will again, pair every items based on indexes:
- first items from all inputs are paired:
(1, 2, 3)
- second items from all inputs are paired:
('a', 'b', 'c')
- first items from all inputs are paired:
Now you can understand why (x, y) == tuple(zip(*zip(x,y)))
is false in this case:
- since
y
is longer thanx
, the first zip operation removed the extra item fromy
(as it couldn't be paired), this change is obviously repercuted on the second zipping operation - types differ, at start we had two lists, now we have two tuples as
zip
does pair items in tuples and not in lists
If you're not 100% certain to understand how zip
work, I wrote an answer to this question here: Unzipping and the * operator
Addendum to @bcherry's answer:
>>> def f(a2,a1):
... print a2, a1
...
>>> d = {'a1': 111, 'a2': 222}
>>> f(**d)
222 111
So it works not just with keyword arguments (in this strict sense), but with named arguments too (aka positional arguments).
精彩评论