What is the most efficient way to concatenate two strings and remove everything before the first ',' in Python?
In Python, I have a string which is a comma separated list of values. e.g. '5,2,7,8,3,4'
I need to add a new value onto the开发者_Python百科 end and remove the first value,
e.g. '5,22,7,814,3,4' -> '22,7,814,3,4,1'
Currently, I do this as follows:
mystr = '5,22,7,814,3,4'
latestValue='1'
mylist = mystr.split(',')
mystr = ''
for i in range(len(mylist)-1):
if i==0:
mystr += mylist[i+1]
if i>0:
mystr += ','+mylist[i+1]
mystr += ','+latestValue
This runs millions of times in my code and I've identified it as a bottleneck, so I'm keen to optimize it to make it run faster.
What is the most efficient to do this (in terms of runtime)?
Use this:
if mystr == '':
mystr = latestValue
else:
mystr = mystr[mystr.find(",")+1:] + "," + latestValue
This should be much faster than any solution which splits the list. It only finds the first occurrence of ,
and "removes" the beginning of the string. Also, if the list is empty, then mystr
will be just latestValue
(insignificant overhead added by this) -- thanks Paulo Scardine for pointing that out.
_, sep, rest = mystr.partition(",")
mystr = rest + sep + latestValue
It also works without any changes if mystr
is empty or a single item (without comma after it) due to str.partition
returns empty sep
if there is no sep
in mystr
.
You could use mystr.rstrip(",")
before calling partition()
if there might be a trailing comma in the mystr
.
mystr = mystr.partition(",")[2]+","+latestValue
improvement suggested by Paulo to work if mystr has < 2 elements.
In the case of 0 elements, it does extend mystr to hold one element.
_,_,mystr = (mystr+','+latestValue).partition(',')
$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr[mystr.find(',')+1:]+','+latestValue"
1000000 loops, best of 3: 0.847 usec per loop
$ python -m timeit -s "mystr = '5,22,7,814,3,4';latestValue='1'" "mystr = mystr.partition(',')[2]+','+latestValue"
1000000 loops, best of 3: 0.703 usec per loop
best version: gnibbler's answer
Since you need speed (millions of times is a lot), I profiled. This one is about twice as fast as splitting the list:
i = 0
while 1:
if mystr[i] == ',': break
i += 1
mystr = mystr[i+1:] + ', ' + latest_value
It assumes that there is one space after each comma. If that's a problem, you can use:
i = 0
while 1:
if mystr[i] == ',': break
i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value
which is only slightly slower than the original but much more robust. It's really up to you to decide how much speed you need to squeeze out of it. They both assume that there will be a comma in the string and will raise an IndexError
if one fails to appear. The safe version is:
i = 0
while 1:
try:
if mystr[i] == ',': break
except IndexError:
i = -1
break
i += 1
mystr = mystr[i+1:].strip() + ', ' + latest_value
Again, this is still significantly faster than than splitting the string but does add robustness at the cost of speed.
Here's the timeit results. You can see that the fourth method is noticeably faster than the third (most robust) method, but slightly slower than the first two methods. It's the fastest of the two robust solutions though so unless you are sure that your strings will have commas in them (i.e. it would already be considered an error if they didn't) then I would use it anyway.
$ python -mtimeit -s'from strings import tests, method1' 'method1(tests[0], "10")' 1000000 loops, best of 3: 1.34 usec per loop
$ python -mtimeit -s'from strings import tests, method2' 'method2(tests[0], "10")' 1000000 loops, best of 3: 1.34 usec per loop
$ python -mtimeit -s'from strings import tests, method3' 'method3(tests[0], "10")' 1000000 loops, best of 3: 1.5 usec per loop
$ python -mtimeit -s'from strings import tests, method4' 'method4(tests[0], "10")' 1000000 loops, best of 3: 1.38 usec per loop
$ python -mtimeit -s'from strings import tests, method5' 'method5(tests[0], "10")' 100000 loops, best of 3: 1.18 usec per loop
This is gnibbler's answer
mylist = mystr.split(',')
mylist.append(latestValue);
mystr = ",".join(mylist[1:])
String concatenation in python isn't very efficient (since strings are immutable). It's easier to work with them as lists (and more efficient). Basically in your code you are copying your string over and over again each time you concatenate to it.
Edited: Not the best, but I love one-liners. :-)
mystr = ','.join(mystr.split(',')[1:]+[latestValue])
Before testing I would bet it would perform better.
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'" \
"mylist = mystr.split(',')" "mylist.append(latestValue);" \
"mystr = ','.join(mylist[1:])"
1000000 loops, best of 3: 1.37 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
"','.join(mystr.split(',')[1:]+[latestValue])"
1000000 loops, best of 3: 1.5 usec per loop
> python -m timeit "mystr = '5,22,7,814,3,4'" "latestValue='1'"\
'mystr=mystr[mystr.find(",")+1:]+","+latestValue'
1000000 loops, best of 3: 0.625 usec per loop
精彩评论