Python: variable-length tuples
[Python 3.1]
I'm following up on the design concept that tuples should be of known length (see this comment), and unknown length tuples should be replaced with lists in most circumstances. My开发者_StackOverflow中文版 question is under what circumstances should I deviate from that rule?
For example, I understand that tuples are faster to create from string and numeric literals than lists (see another comment). So, if I have performance-critical code where there are numerous calculations such as sumproduct(tuple1, tuple2)
, should I redefine them to work on lists despite a performance hit? (sumproduct((x, y, z), (a, b, c))
is defined as x * a + y * b + z * c
, and its arguments have unspecified but equal lengths).
And what about the tuple that is automatically built by Python when using def f(*x)
? I assume it's not something I should coerce to list every time I use it.
Btw, is (x, y, z)
faster to create than [x, y, z]
(for variables rather than literals)?
In my mind, the only interesting distinction between tuples and lists is that lists are mutable and tuples are not. The other distinctions that people mention seem completely artificial to me: tuples are like structs and lists are like arrays (this is where the "tuples should be a known length" comes from). But how is struct-ness aligned with immutability? It isn't.
The only distinction that matters is the distinction the language makes: mutability. If you need to modify the object, definitely use a list. If you need to hash the object (as a key in a dict, or an element of a set), then you need it to be immutable, so use a tuple. That's it.
I always use the most the appropriate data structure for the job and do not really worry about if a tuple would save me half a millisecond here or there. Pre-obfuscating your code does not usually pay off in the end. If the code runs too slow you can always profile it later and change the .01% of code where it really matters.
All the things you are talking about are tied in to the implementation of the python version and the hardware it is running on. You can always time those things your self to see what they would be on your machine.
A common example of this is the 'old immutable strings are slow to concatenate' in python. This was true about 10 years ago, and then they changed the implementation in 2.4 or 2.5. If you do your own tests they now run faster than lists, but people are convinced of this still today and use silly constructs that actually ran slower!
under what circumstances should I deviate from that [tuples should be of known length] rule?
None.
It's a matter of meaning. If an object has meaning based on a fixed number of elements, then it's a tuple. (x,y) coordinates, (c,m,y,k) colors, (lat, lon) position, etc., etc.
A tuple has a fixed number of elements based on the problem domain in general and the specifics of the problem at hand.
Designing a tuple with an indefinite number of elements makes little sense. When do we switch from (x,y) to (x,y,z) and then to (x,y,z,w) coordinates? Not by simply concatenating a value as if it's a list? If we're moving from 2-d to 3-d coordinates there's usually some pretty fancy math to map the coordinate systems. Not appending an element to a list.
What does it mean to move from (r,g,b) colors to something else? What is the 4th color in the rgb system? For that matter, what's the fifth color in the cmyk ststem?
Tuples do not change size.
*args
is a tuple because it is immutable. Yes, it has an indefinite number of arguments, but it's a rare counter-exmaple to tuples of known, defined sizes.
What to do about an indefinite length tuple. This counter-example is so profound that we have two choices.
Reject the very idea that tuples are fixed-length, and constrained by the problem,. The very idea of (x,y) coordinates and (r,g,b) colors is utterly worthless and wrong because of this counter-example. Fixed-length tuples? Never.
Always convert all
*args
to lists to always have a fussy level of unthinking conformance to a design principle. Covert to lists? Always.
I love all or nothing choices, since they make software engineering so simplistic and unthinking.
Perhaps, in these corner cases, there's a tiny scrap of "this requires thinking" here. A tiny scrap.
Yes, *args
is a tuple. Yes, it's of indefinite length. Yes, it's a counter-example where "fixed by the problem domain" is trumped by "simply immutable".
This leads us to the third choice in the case where a sequence is immutable for a different reason. You'll never mutate it, so it's okay to be a tuple of indefinite size. In the even-more-rare case where you're popping values of *args
because you're treating it like a stack or a queue, then you might want to make a list out of it. But we can't pre-solve all possible problems.
Sometimes Thinking Is Required.
When you're doing design, you design a tuple for a reason. To impose a meaningful structure on your data. Fixed-length number of elements? Tuple. Variable number of elements (i.e., mutable)? List.
In this case, you should probably consider using numpy and numpy arrays.
There is some overhead converting to and from numpy arrays, but if you are doing a bunch of calculation it will be much faster
精彩评论