smartest way to join two lists into a formatted string
Lets say I have two lists of same length:
a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']
and I want to produce the following string:
c = 'a1=b1, a2=b2, a3=b3'
What is the best way to achieve this?
I have following implementations:
import timeit
a = [str(f) for f in range(500)]
b = [str(f) for f in range(500)]
def func1():
return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])
def func2():
list = []
for i in range(len(a)):
list.ap开发者_开发问答pend('%s=%s' % (a[i], b[i]))
return ', '.join(list)
t = timeit.Timer(setup='from __main__ import func1', stmt='func1()')
print 'func1 = ' + t.timeit(10)
t = timeit.Timer(setup='from __main__ import func2', stmt='func2()')
print 'func2 = ' + t.timeit(10)
and the output is:
func1 = 32.4704790115
func2 = 0.00529003143311
Do you have some trade-off?
This implementation is, on my system, faster than either of your two functions and still more compact.
c = ', '.join('%s=%s' % t for t in zip(a, b))
Thanks to @JBernardo for the suggested improvement.
In more recent syntax, str.format
is more appropriate:
c = ', '.join('{}={}'.format(*t) for t in zip(a, b))
This produces the largely the same output, though it can accept any object with a __str__
method, so two lists of integers could still work here.
a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']
pat = '%s=%%s, %s=%%s, %s=%%s'
print pat % tuple(a) % tuple(b)
gives a1=b1, a2=b2, a3=b3
.
Then:
from timeit import Timer
from itertools import izip
n = 300
a = [str(f) for f in range(n)]
b = [str(f) for f in range(n)]
def func1():
return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])
def func2():
list = []
for i in range(len(a)):
list.append('%s=%s' % (a[i], b[i]))
return ', '.join(list)
def func3():
return ', '.join('%s=%s' % t for t in zip(a, b))
def func4():
return ', '.join('%s=%s' % t for t in izip(a, b))
def func5():
pat = n * '%s=%%s, '
return pat % tuple(a) % tuple(b)
d = dict(zip((1,2,3,4,5),('heavy','append','zip','izip','% formatting')))
for i in xrange(1,6):
t = Timer(setup='from __main__ import func%d'%i, stmt='func%d()'%i)
print 'func%d = %s %s' % (i,t.timeit(10),d[i])
result
func1 = 16.2272833558 heavy
func2 = 0.00410247671143 append
func3 = 0.00349569568199 zip
func4 = 0.00301686387516 izip
func5 = 0.00157338432678 % formatting
Those two solutions do very different things. The first loops in a nested way, then computes indexes with list.index
, effectively making this a doubly-nested for loop and requiring what you could think of as 125,000,000 operations. The second iterates in lockstep, making 500 pairs without doing 250000 operations. No wonder they're so different!
Are you familiar with Big O notation for describing the complexity of algorithms? If so, the first solution is cubic and the second solution is linear. The cost of choosing the first one over the second one is going to grow at an alarming rate as a
and b
get longer, so no one would use an algorithm like that.
Personally, I would almost certainly use code like
', '.join('%s=%s' % pair for pair in itertools.izip(a, b))
or if I wasn't too worried about the size of a
and b
and just writing quick, I would use zip
instead of itertools.izip
. This code has several advantages
It's linear. Although premature optimization is a huge problem, it's best not to cavalierly use an algorithm with an unnecessarily bad asymptotic performance.
It's simple and idiomatic. I see other people write code like this frequently.
It's memory efficient. By using a generator expression instead of a list comprehension (and
itertools.izip
rather thanzip
), I don't build unnecessary lists in memory and turn what could be an O(n) (linear)-memory operation into an O(1) (constant)-memory operation.
As for timing to find the fastest solution, this would almost certainly be an example of premature optimization. To write performant programs, we use theory and experience to write high-quality, maintainable, good code. Experience shows it's at best futile and at worst counterproductive to stop at random operations and ask the question, "What is the best way to do this particular operation," and trying to determine it from guessing or even testing.
In reality, the programs with the best performance are the ones that are written with code of the highest quality and very selective optimizations. High-quality code that values readability and simplicity over microbenchmarks ends up being easier to test, less buggy, and nicer to refactor--these factors are key for effectively optimizing your program. The time you spend fixing unnecessary bugs, understanding complicated code, and fighting with re factoring can be spent optimizing instead.
When it comes time to optimize a program -- after it's tested and probably documented -- this is not done on random snippets, but on ones determined by actual usecases and/or performance tests, with measurements collected by profiling. If a particular piece of code is only taking 0.1% of the time in the program, no amount of speeding up that piece is going to do any real good.
>>> ', '.join(i + '=' + j for i,j in zip(a,b))
'a1=b1, a2=b2, a3=b3'
精彩评论