Python - multiple inserts of large text
In Python, what's the fastest way (i.e. a single-pass method) to go about inserting two large sets of text, A and B, into another large set of text, C.
Where C is for example:
.... ....
<<<A goes here&g开发者_JAVA技巧t;>>
.... ....
<<<B goes here>>>
.... ....
and "...." represents quite a lot of text (i.e. up to 20k).
What's the best way to insert A and B at the appropriate placeholders, where A and B are also not "small" amounts of text (i.e. up to 2k).
My first thought is to do:
C.replace("<<<A goes here>>>", A)
C.replace("<<<B goes here>>>", B)
however since it's passed through twice, I'd hope there to be a single-pass solution. I've considered a regex but that seems overkill. string.Template
is an option but the syntax isn't suitable (i.e. $A
is a placeholder that could have collisions with other text, and '$' wouldn't be suitable escaped elsewhere).
While A and B placeholders occur only once in C, I would like to have a solution that would scale to a larger number of substitutions -- i.e. independent of the number of substitutions; O(n) where n is len(C).
Thoughts and suggestions appreciated.
Thank you.
Brian
The documentation for str.find says it returns the index of the first occurrence of the searched-for string, which (to me) implies that it doesn't go through the whole string. If you know that "<<<A goes here>>>"
always occurs first, I'd do:
Aflag = "<<<A goes here>>>"
Bflag = "<<<B goes here>>>"
Aidx = C.find(Aflag)
Bidx = C.find(Bflag, Aidx+len(Aflag)+1)
newC = "".join((C[:Aidx], A, C[Aidx+len(Aflag)+1:Bidx], B, C[Bidx+len(Bflag)+1:]))
If my assumption is right, this minimizes searching through the string.
Depending on the version of Python you're using, you can use the format
function or the %
operator. %
works everywhere, so here's an example:
'...\n%s\n...\n%s\n...' % (a, b)
That will put the contents of a
in the first %s
and b
in the second.
So, assuming you can change C's placeholders, turn them into %s
s and you're good to go.
You could apply any of the string searching algorithms, more specifically KMP which has O(n) time. Since you've assumed you keep A and B separate you could do this in one pass.However, it would probably end up being O(n+k) since you have to replace the string once you've found it.
http://en.wikipedia.org/wiki/String_searching_algorithm
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
There's probably a MUCH easier way to do this in Python that I'm not familiar with, but if you've never seen these algorithms they're worth taking a look at.
the fastest is not the one I thought! replace is fastest and simpliest
s = "\r\n"+("helllo"*100+"\r\n")*100
s2 = (s + "tag1" + s + "tag2" +s + "tag2" + s + "tag1" + s)*10
t1= ("t1"*100+"\r\n")*100
t2= ("t2"*100+"\r\n")*100
print "size = ",len(s2)
def f1():
return s2.replace("tag1",t1).replace("tag2",t2)
def f2():
return "\r\n".join([ x.replace("tag1",t1).replace("tag2",t2) for x in s2.split("\r\n")])
m = {"tag1":t1,"tag2":t2}
def f3():
p1 = 0
res= ""
while(p1 >= 0):
p2 = s2.find("tag",p1)
if (p2>=0):
res+= s2[p1:p2]+m[s2[p2:p2+4]]
p1 = p2+4
else :
res+= s2[p1:]
p1 = -1
return res
def g1():
for i in range(100):
f1()
def g2():
for i in range(100):
f2()
def g3():
for i in range(100):
f3()
if ( f1() != f2()):
print "problem"
if ( f1() != f3()):
print "problem"
import cProfile
cProfile.run('g1()')
cProfile.run('g2()')
cProfile.run('g3()')
精彩评论