开发者

Python - multiple inserts of large text

In Python, what's the fastest way (i.e. a single-pass method) to go about inserting two large sets of text, A and B, into another large set of text, C.

Where C is for example:

.... ....
<<<A goes here&g开发者_JAVA技巧t;>>
.... ....
<<<B goes here>>>
.... ....

and "...." represents quite a lot of text (i.e. up to 20k).

What's the best way to insert A and B at the appropriate placeholders, where A and B are also not "small" amounts of text (i.e. up to 2k).

My first thought is to do:

C.replace("<<<A goes here>>>", A)
C.replace("<<<B goes here>>>", B)

however since it's passed through twice, I'd hope there to be a single-pass solution. I've considered a regex but that seems overkill. string.Template is an option but the syntax isn't suitable (i.e. $A is a placeholder that could have collisions with other text, and '$' wouldn't be suitable escaped elsewhere).

While A and B placeholders occur only once in C, I would like to have a solution that would scale to a larger number of substitutions -- i.e. independent of the number of substitutions; O(n) where n is len(C).

Thoughts and suggestions appreciated.

Thank you.

Brian


The documentation for str.find says it returns the index of the first occurrence of the searched-for string, which (to me) implies that it doesn't go through the whole string. If you know that "<<<A goes here>>>" always occurs first, I'd do:

Aflag = "<<<A goes here>>>"
Bflag = "<<<B goes here>>>"
Aidx = C.find(Aflag)
Bidx = C.find(Bflag, Aidx+len(Aflag)+1)

newC = "".join((C[:Aidx], A, C[Aidx+len(Aflag)+1:Bidx], B, C[Bidx+len(Bflag)+1:]))

If my assumption is right, this minimizes searching through the string.


Depending on the version of Python you're using, you can use the format function or the % operator. % works everywhere, so here's an example:

'...\n%s\n...\n%s\n...' % (a, b)

That will put the contents of a in the first %s and b in the second.

So, assuming you can change C's placeholders, turn them into %ss and you're good to go.


You could apply any of the string searching algorithms, more specifically KMP which has O(n) time. Since you've assumed you keep A and B separate you could do this in one pass.However, it would probably end up being O(n+k) since you have to replace the string once you've found it.

http://en.wikipedia.org/wiki/String_searching_algorithm

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

There's probably a MUCH easier way to do this in Python that I'm not familiar with, but if you've never seen these algorithms they're worth taking a look at.


the fastest is not the one I thought! replace is fastest and simpliest

s = "\r\n"+("helllo"*100+"\r\n")*100
s2 = (s + "tag1" + s + "tag2" +s  + "tag2" + s + "tag1" + s)*10
t1= ("t1"*100+"\r\n")*100
t2= ("t2"*100+"\r\n")*100

print "size = ",len(s2)

def f1():
 return s2.replace("tag1",t1).replace("tag2",t2)

def f2():
 return "\r\n".join([ x.replace("tag1",t1).replace("tag2",t2) for x in s2.split("\r\n")])

m = {"tag1":t1,"tag2":t2}
def f3():
 p1 = 0
 res= ""
 while(p1 >= 0):
  p2 = s2.find("tag",p1)
  if (p2>=0):
   res+= s2[p1:p2]+m[s2[p2:p2+4]]
   p1 = p2+4
  else :
   res+= s2[p1:]
   p1 = -1
 return res


def g1():
 for i in range(100):
  f1()
def g2():
 for i in range(100):
  f2()

def g3():
 for i in range(100):
  f3()

if ( f1() != f2()):
 print "problem"
if ( f1() != f3()):
 print "problem"


import cProfile
cProfile.run('g1()')
cProfile.run('g2()')
cProfile.run('g3()')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜