smartest way to join two lists into a formatted string

2023-04-01 04:31 问答作者：

Lets say I have two lists of same length:

a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']

and I want to produce the following string:

c = 'a1=b1, a2=b2, a3=b3'

What is the best way to achieve this?

I have following implementations:

import timeit

a = [str(f) for f in range(500)]
b = [str(f) for f in range(500)]

def func1():
    return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])

def func2():
    list = []
    for i in range(len(a)):
        list.ap开发者_开发问答pend('%s=%s' % (a[i], b[i]))
    return ', '.join(list)

t = timeit.Timer(setup='from __main__ import func1', stmt='func1()')
print 'func1 = ' + t.timeit(10) 

t = timeit.Timer(setup='from __main__ import func2', stmt='func2()')
print 'func2 = ' + t.timeit(10)

and the output is:

func1 = 32.4704790115
func2 = 0.00529003143311

Do you have some trade-off?

This implementation is, on my system, faster than either of your two functions and still more compact.

c = ', '.join('%s=%s' % t for t in zip(a, b))

Thanks to @JBernardo for the suggested improvement.

In more recent syntax, str.format is more appropriate:

c = ', '.join('{}={}'.format(*t) for t in zip(a, b))

This produces the largely the same output, though it can accept any object with a __str__ method, so two lists of integers could still work here.

a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']

pat = '%s=%%s, %s=%%s, %s=%%s'

print pat % tuple(a) % tuple(b)

gives a1=b1, a2=b2, a3=b3

Then:

from timeit import Timer
from itertools import izip

n = 300

a = [str(f) for f in range(n)]
b = [str(f) for f in range(n)]

def func1():
    return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])

def func2():
    list = []
    for i in range(len(a)):
        list.append('%s=%s' % (a[i], b[i]))
    return ', '.join(list)

def func3():
    return ', '.join('%s=%s' % t for t in zip(a, b))

def func4():
    return ', '.join('%s=%s' % t for t in izip(a, b))

def func5():
    pat = n * '%s=%%s, '
    return pat % tuple(a) % tuple(b)

d = dict(zip((1,2,3,4,5),('heavy','append','zip','izip','% formatting')))
for i in xrange(1,6):
    t = Timer(setup='from __main__ import func%d'%i, stmt='func%d()'%i)
    print 'func%d = %s  %s' % (i,t.timeit(10),d[i])

result

func1 = 16.2272833558  heavy
func2 = 0.00410247671143  append
func3 = 0.00349569568199  zip
func4 = 0.00301686387516  izip
func5 = 0.00157338432678  % formatting

Those two solutions do very different things. The first loops in a nested way, then computes indexes with list.index, effectively making this a doubly-nested for loop and requiring what you could think of as 125,000,000 operations. The second iterates in lockstep, making 500 pairs without doing 250000 operations. No wonder they're so different!

Are you familiar with Big O notation for describing the complexity of algorithms? If so, the first solution is cubic and the second solution is linear. The cost of choosing the first one over the second one is going to grow at an alarming rate as a and b get longer, so no one would use an algorithm like that.

Personally, I would almost certainly use code like

', '.join('%s=%s' % pair for pair in itertools.izip(a, b))

or if I wasn't too worried about the size of a and b and just writing quick, I would use zip instead of itertools.izip. This code has several advantages

It's linear. Although premature optimization is a huge problem, it's best not to cavalierly use an algorithm with an unnecessarily bad asymptotic performance.
It's simple and idiomatic. I see other people write code like this frequently.
It's memory efficient. By using a generator expression instead of a list comprehension (and itertools.izip rather than zip), I don't build unnecessary lists in memory and turn what could be an O(n) (linear)-memory operation into an O(1) (constant)-memory operation.

As for timing to find the fastest solution, this would almost certainly be an example of premature optimization. To write performant programs, we use theory and experience to write high-quality, maintainable, good code. Experience shows it's at best futile and at worst counterproductive to stop at random operations and ask the question, "What is the best way to do this particular operation," and trying to determine it from guessing or even testing.

In reality, the programs with the best performance are the ones that are written with code of the highest quality and very selective optimizations. High-quality code that values readability and simplicity over microbenchmarks ends up being easier to test, less buggy, and nicer to refactor--these factors are key for effectively optimizing your program. The time you spend fixing unnecessary bugs, understanding complicated code, and fighting with re factoring can be spent optimizing instead.

When it comes time to optimize a program -- after it's tested and probably documented -- this is not done on random snippets, but on ones determined by actual usecases and/or performance tests, with measurements collected by profiling. If a particular piece of code is only taking 0.1% of the time in the program, no amount of speeding up that piece is going to do any real good.

>>> ', '.join(i + '=' + j for i,j in zip(a,b))
'a1=b1, a2=b2, a3=b3'

继续阅读：format join list python

smartest way to join two lists into a formatted string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？