Python: Replace tags but preserve inner text?
I have a string.
"This is an [[example]] sentence. It is [[awesome]]
."
I want to replace all instances of [[.]]
with <b>.</b>
preserving the wildcard text matched by .
The result should be:
"This is an <b>example</b> sentence. It is <b>awesome</b>
."
I could go in and manually replace [[
with <b>
and ]]
with </b>
, but it makes more sense to just do it at once and preserve the text between tags.
How do I do this?
Note: This is for taking source from a database and converting it to HTML. It's supposed to mimic wiki-style s开发者_开发百科yntax. In this case, [[x]] results in a bold typeface.
You can just use the replace
method on the string.
>>> s = 'This is an [[example]] sentence. It is [[awesome]].'
>>> s.replace('[[', '<b>').replace(']]', '</b>')
'This is an <b>example</b> sentence. It is <b>awesome</b>.'
Just to get some timeit results in here:
$ python -mtimeit -s'import re' "re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"''
100000 loops, best of 3: 19.7 usec per loop
$ python -mtimeit '"This is an [[example]] sentence. It is [[awesome]]".replace("[[", "<b>").replace("]]", "</b>")'
100000 loops, best of 3: 1.94 usec per loop
If we compile the regex we get slightly better performance:
$ python -mtimeit -s"import re; r = re.compile(r'\[\[(.*?)\]\]')" "r.sub( r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"
100000 loops, best of 3: 16.9 usec per loop
How about the use of re.sub()
and a little regular expression magic:
import re
re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', "This is an [[example]] sentence. It is [[awesome]]");
This code allows you to extend the replacements list at will.
import re
_replacements = {
'[[': '<b>',
']]': '</b>',
'{{': '<i>',
'}}': '</i>',
}
def _do_replace(match):
return _replacements.get(match.group(0))
def replace_tags(text, _re=re.compile('|'.join(re.escape(r) for r in _replacements))):
return _re.sub(_do_replace, text)
print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")
This is an <b>example</b> sentence. It is <b><i>awesome</i></b>.
... the advantage of using the regex approach here might be that it prevents doing substitutions when the source text doesn't have matching pairs of [[
and ]]
.
Maybe important, maybe not.
The methods suggested by the other posters will certainly work, however I would like to point out that using Regular Expresions for this task will incur quite a performance hit.
The example you supplied could be solved just as well using native Python string operations, and would perform roughly 3 times faster.
For instance:
>>> import timeit
>>> st = 's = "This is an [[example]] sentence. It is [[awesome]]"'
>>> t = timeit.Timer('s.replace("[[","<b>").replace("]]","</b>")',st)
>>> t.timeit() # Run 1000000 times
1.1733845739904609
>>> tr = timeit.Timer("re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>',s)",'import re; ' + st)
>>> tr.timeit() # Run 1000000 times
3.7482673050677704
>>>
Hope this helps :)
精彩评论