开发者

In python, what is more efficient? Modifying lists or strings?

Regardless of ease of use, which is more computationally efficient? Constantly slicing lists and appending to them? Or taking substrings and doing the same?

As an example, let's say I have two binary strings "11011" and "01001". If I represent these as lists, I'll be choosing a random "slice" point. Let's say I get 3. I'll Take the first 3 characters of the first string and the remaining characters of the second string (so I'd have to slice both) and create a new string out of it.

Would this be more efficiently done by cutting the substrings or by representing it as a list ( [1, 1, 0, 1, 1] ) rather than a strin开发者_如何学Pythong?


>>> a = "11011"
>>> b = "01001"
>>> import timeit
>>> def strslice():
    return a[:3] + b[3:]

>>> def lstslice():
    return list(a)[:3] + list(b)[3:]
>>> c = list(a)
>>> d = list(b)
>>> def lsts():
    return c[:3] + d[3:]

>>> timeit.timeit(strslice)
0.5103488475836432
>>> timeit.timeit(lstslice)
2.4350100538824613
>>> timeit.timeit(lsts)
1.0648406858527295


timeit is a good tool for micro-benchmarking, but it needs to be used with the utmost care when the operations you want to compare may involve in-place alterations -- in this case, you need to include extra operations designed to make needed copies. Then, first time just the "extra" overhead:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.01 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.06 usec per loop

So making the two brand-new lists we need (to avoid alteration) costs a tad more than 5 microseconds (when focused on small differences, run things at least 2-3 times to eyeball the uncertainty range). After which:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.5 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.47 usec per loop

string slicing and concatenation in this case can be seen to cost another 410-490 nanoseconds. And:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop

in-place list splicing can be seen to cost 930-980 nanoseconds. The difference is safely above the noise/uncertainty levels, so you can reliably state that for this use case working with strings is going to take roughly half as much time as working in-place with lists. Of course, it's also crucial to measure a range of use cases that are relevant and representative of your typical bottleneck tasks!


In general, modifying lists is more efficient than modifying strings, because strings are immutable.


It really depends on actual use cases, and as others have said, profile it, but in general, appending to lists will be better, because it can be done in place, whereas "appending to strings" actually creates a new string that concatenates the old strings. This can rapidly eat up memory. (Which is a different issue from computational efficiency, really).

Edit: If you want computational efficiency with binary values, don't use strings or lists. Use integers and bitwise operations. With recent versions of python, you can use binary representations when you need them:

>>> bin(42)
'0b101010'
>>> 0b101010
42
>>> int('101010')
101010
>>> int('101010', 2)
42
>>> int('0b101010')
...
ValueError: invalid literal for int() with base 10: '0b101010'
>>> int('0b101010', 2)
42

Edit 2:

def strslice(a, b):
    return a[:3] + b[3:]

might be better written something like:

def binspice(a, b):
    mask = 0b11100
    return (a & mask) + (b & ~mask)

>>> a = 0b11011
>>> b = 0b1001
>>> bin(binsplice(a, b))
'0b11001
>>> 

It might need to be modified if your binary numbers are different sizes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜