Speed vs security vs compatibility over methods to do string concatenation in Python
Similar questions have been brought (good speed comparison there) on this same subject. Hopefully this question is different and updated to Python 2.6 and 3.0.
So far I believe the faster and most compatible method (among different Python versions) is the plain simple +
sign:
text = "whatever" + " you " + SAY
But I keep hearing and reading it's not secure and / or advisable.
I'm not even sure how many methods are there to manipulate strings! I could count only about 4: There's interpolation and all its sub-options such as %
and format
and then there's the simple ones, join
and +
.
Finally, the new approach to string formatting, which is with format
, is certainly not good for backwards compatibility at same time making %
not good for forward compatibility. But should it be used for every string manipulation, including every concatenation, whenever we restrict ourselves to 3.x only?
We开发者_C百科ll, maybe this is more of a wiki than a question, but I do wish to have an answer on which is the proper usage of each string manipulation method. And which one could be generally used with each focus in mind (best all around for compatibility, for speed and for security).
Thanks.
edit: I'm not sure I should accept an answer if I don't feel it really answers the question... But my point is that all them 3 together do a proper job.
Daniel's most voted answer is actually the one I'd prefer for accepting, if not for the "note". I highly disagree with "concatenation is strictly using the + operator to concatenate strings" because, for one, join
does string concatenation as well, and we can build any arbitrary library for that.
All current 3 answers are valuable and I'd rather having some answer mixing them all. While nobody volunteer to do that, I guess by choosing the one less voted (but fairly broader than THC4k's, which is more like a large and very welcomed comment) I can draw attention to the others as well.
As a note: Really this is all about string construction and not concatenation, per se, as concatenation is strictly using the +
operator to concatenate strings together one after the other.
+
(concatenation) - generally inefficient but can be easier to read for some people, only use when readability is priority and performance is not (simple scripts, throwaway scripts, non-performance intensive code)join
(building a string from a sequence of strings) - use this when you have a sequence of strings that you need to join using a common character (or no character at all if you want to use the empty string''
to join on)%
andformat
(interpolation) - basically every other operation should use whichever one of these is appropriate, choose which operator/function is appropriate based on which version of Python you want to support for the lifetime of the code (use%
for 2.x andformat
for 3.x)
The problem with +
for strings is the same as in many other languages: Each time you extend the string, it is copied. So to construct a single strings from 100 substrings, Python copies each of the 99 steps.
And that takes some time:
# join 100 pretty short strings
python -m timeit -s "s = ['pretty short'] * 100" "t = ''.join(s)"
100000 loops, best of 3: 4.18 usec per loop
# same thing, 6 times slower
python -m timeit -s "s = ['pretty short'] * 100" "t = ''" "for x in s:" " t+=x"
10000 loops, best of 3: 30 usec per loop
Using +
is OK, but not if it's automated:
a + small + number + of + strings + "is pretty fast"
but this can be very slow:
s = ''
for line in anything:
s += line
Use this instead:
s = ''.join([line for line in anything])
There are pros and cons of use +
vs '%s%line'
- using +
will fail here:
s = 'Error - unexpected string' + 42
Whether you want it to throw an exception, or silently do something unusual depends on your use.
精彩评论