Python efficiency: lists vs. tuples
I have a medium-amount of base objects.
These base objects will be put in collections, and these collections will be munged around: sorted, truncated, etc.
Unfortunately, the n is large enough that memory consumption is slightly worrisome, and speed is getting concerning.
My understanding is that tuples are slightly more memory-efficient, since they are deduplicated.
Anyway,开发者_C百科 I would like to know what the cpu/memory tradeoffs of lists vs. tuples are in Python 2.6/2.7.
If you have a tuple and a list with the same elements, the tuple takes less space. Since tuples are immutable, you can't sort them, add to them, etc. I recommend watching this talk by Alex Gaynor for a quick intro on when to choose what datastructure in Python.
UPDATE: Thinking about it some more, you may want to look into optimizing the space usage of your objects, e.g., via __slots__
or using namedtuple
instances as proxies instead of the actual objects. This would likely lead to much bigger savings, since you have N of them and (presumbaly) only a few collections in which they appear. namedtuple
in particular is super awesome; check out Raymond Hettinger's talk.
As others mentioned tuples are immutable. Sorting a tuple (e.g. sorted(mytuple)
) returns a list, which you would then have to cast back to a tuple.
To sort a tuple (and keep it a tuple) you'd have to do this:
mytuple = (3,2,1)
mysortedtuple = tuple(sorted(mytuple))
To sort a list you'd have to do this:
mylist = [3,2,1]
mylist.sort()
Because you're not casting and re-casting, the latter, in this instance, is more efficient.
Don't get hung up on using tuples over lists unless you have a good justification. If you need sorted data, tuples are not the way to go unless they are created that way in the first place. Tuples excel when the data they contain DOES NOT CHANGE, such as with configuration settings that are loaded at run-time, or data that has already been processed.
Considering that you mentioned you are processing a large dataset, you might want to look at using a functional programming style by way of generators and iterators over lists and tuples. This way you're not shuttling around and creating new containers, but just chaining iteration operations to get to the end result.
Further reading:
- Python's itertools
- Python functional programming HOWTO
What is the (average, min, max) number of base objects in a collection?
Tuples are "deduplicated" and lists are not? What do you think that "deduplicated" means in this context?
Lists do take up more memory than tuples, because extra memory is allocated on the presumption that a list is going to grow and you definitely don't want to realloc() memory each time you do large_list.append(). However on a 32-bit machine, the amortised cost of an extra list element is 4 bytes for a pointer, N bytes for the element itself, and no more than another 4 bytes for the extra memory. N is 16 bytes for a float. That means a list of floats takes up to 24 bytes per extra float, compared with 20 bytes for a tuple. A "base object" with N==100 gives a comparison of 108 versus 104. If a based object is referred to in two collections, then 58 versus 54. How big is your N?
Advice: Leave your collections as lists. Concentrate on:
ensuring your base objects are memory-efficient
use generators and itertools goodies instead of temporary lists where possible
if you can't avoid having temporary lists, ensure that they are thrown away immmediately they are not needed any more i.e. don't wait till the creating method returns; use explicit
del
as soon as possible.
In addition to all these suggestions, you may find that numpy will fill your needs. If your objects are something that numpy handles by default (ints, native C types, etc) then that would be ideal. You can use a numpy array with custom objects as well, but that might be more work than it's worth.
You can't use them the same way. Tuples are immutable and don't support appending, sorting, etc (calling sorted
on a tuple yields a list, and so on). Tuples are totally different from lists, so any performance comparison is meaningless.
You cannot sort an immutable object - i.e. when sorting a tuple you'll always create a new one.
There are at least two existing questions that are similar enough to yours that the answers (or links within them) may be useful to you. To summarize: let the features of the type (mutable vs. immutable, heterogeneous vs. homogeneous) rather than performance guide your decision, because the performance/efficiency differences are minimal.
What's the difference between list and tuples in Python?
What are differences between List, Dictionary and Tuple in Python?
精彩评论