开发者

Python: Replace tags but preserve inner text?

I have a string.

"This is an [[example]] sentence. It is [[awesome]]."

I want to replace all instances of [[.]] with <b>.</b> preserving the wildcard text matched by .

The result should be: "This is an <b>example</b> sentence. It is <b>awesome</b>."

I could go in and manually replace [[ with <b> and ]] with </b>, but it makes more sense to just do it at once and preserve the text between tags.

How do I do this?

Note: This is for taking source from a database and converting it to HTML. It's supposed to mimic wiki-style s开发者_开发百科yntax. In this case, [[x]] results in a bold typeface.


You can just use the replace method on the string.

>>> s = 'This is an [[example]] sentence. It is [[awesome]].'
>>> s.replace('[[', '<b>').replace(']]', '</b>')

'This is an <b>example</b> sentence. It is <b>awesome</b>.'

Just to get some timeit results in here:

$ python -mtimeit -s'import re' "re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"''
100000 loops, best of 3: 19.7 usec per loop

$ python -mtimeit '"This is an [[example]] sentence. It is [[awesome]]".replace("[[", "<b>").replace("]]", "</b>")'
100000 loops, best of 3: 1.94 usec per loop

If we compile the regex we get slightly better performance:

$ python -mtimeit -s"import re; r = re.compile(r'\[\[(.*?)\]\]')" "r.sub( r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"
100000 loops, best of 3: 16.9 usec per loop


How about the use of re.sub() and a little regular expression magic:

import re
re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', "This is an [[example]] sentence. It is [[awesome]]");


This code allows you to extend the replacements list at will.

import re

_replacements = {
    '[[': '<b>',
    ']]': '</b>',
    '{{': '<i>',
    '}}': '</i>',
}

def _do_replace(match):
    return _replacements.get(match.group(0))

def replace_tags(text, _re=re.compile('|'.join(re.escape(r) for r in _replacements))):
    return _re.sub(_do_replace, text)

print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

This is an <b>example</b> sentence. It is <b><i>awesome</i></b>.


... the advantage of using the regex approach here might be that it prevents doing substitutions when the source text doesn't have matching pairs of [[ and ]].

Maybe important, maybe not.


The methods suggested by the other posters will certainly work, however I would like to point out that using Regular Expresions for this task will incur quite a performance hit.

The example you supplied could be solved just as well using native Python string operations, and would perform roughly 3 times faster.

For instance:

>>> import timeit
>>> st = 's = "This is an [[example]] sentence. It is [[awesome]]"'
>>> t = timeit.Timer('s.replace("[[","<b>").replace("]]","</b>")',st)
>>> t.timeit()  # Run 1000000 times
1.1733845739904609
>>> tr = timeit.Timer("re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>',s)",'import re; ' + st)
>>> tr.timeit() # Run 1000000 times
3.7482673050677704
>>> 

Hope this helps :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜