开发者

Regular expressions in a Python find-and-replace script? Update

I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.

I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following:

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.output'

findreplace = [
('term1', 'term2'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()

How would I go about having the script do a find and replace for regular expressions?

Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg:

Title: This is the title
Author: This is the author
Date: This is the date

and convert it int开发者_运维问答o LaTeX format. Eg:

\title{This is the title}
\author{This is the author}
\date{This is the date}

Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know!

Thanks!

Update: Thanks for posting some example code in your answers! I can get it to work so long as I replace the findreplace action, but I can't get both to work. The problem now is I can't integrate it properly into the code I've got. How would I go about having the script do multiple actions on 'outtext' in the below snippet?

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext


>>> import re
>>> s = """Title: This is the title
... Author: This is the author
... Date: This is the date"""
>>> p = re.compile(r'^(\w+):\s*(.+)$', re.M)
>>> print p.sub(r'\\\1{\2}', s)
\Title{This is the title}
\Author{This is the author}
\Date{This is the date}

To change the case, use a function as replace parameter:

def repl_cb(m):
    return "\\%s{%s}" %(m.group(1).lower(), m.group(2))

p = re.compile(r'^(\w+):\s*(.+)$', re.M)
print p.sub(repl_cb, s)

\title{This is the title}
\author{This is the author}
\date{This is the date}


See re.sub()


The regular expression you want would probably be along the lines of this one:

^([^:]+): (.*)

and the replacement expression would be

\\\1{\2}


>>> import re
>>> m = 'title', 'author', 'date'
>>> s = """Title: This is the title
Author: This is the author
Date: This is the date"""
>>> for i in m:
    s = re.compile(i+': (.*)', re.I).sub(r'\\' + i + r'{\1}', s)


>>> print(s)
\title{This is the title}
\author{This is the author}
\date{This is the date}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜