Python: Replace tags but preserve inner text?

2023-01-26 17:55 问答作者：

I have a string.

"This is an [[example]] sentence. It is [[awesome]]."

I want to replace all instances of [[.]] with . preserving the wildcard text matched by .

The result should be: "This is an example sentence. It is awesome."

I could go in and manually replace [[ with  and ]] with , but it makes more sense to just do it at once and preserve the text between tags.

How do I do this?

Note: This is for taking source from a database and converting it to HTML. It's supposed to mimic wiki-style s开发者_开发百科yntax. In this case, [[x]] results in a bold typeface.

You can just use the replace method on the string.

>>> s = 'This is an [[example]] sentence. It is [[awesome]].'
>>> s.replace('[[', '<b>').replace(']]', '</b>')

'This is an <b>example</b> sentence. It is <b>awesome</b>.'

Just to get some timeit results in here:

$ python -mtimeit -s'import re' "re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"''
100000 loops, best of 3: 19.7 usec per loop

$ python -mtimeit '"This is an [[example]] sentence. It is [[awesome]]".replace("[[", "<b>").replace("]]", "</b>")'
100000 loops, best of 3: 1.94 usec per loop

If we compile the regex we get slightly better performance:

$ python -mtimeit -s"import re; r = re.compile(r'\[\[(.*?)\]\]')" "r.sub( r'<b>\1</b>', 'This is an [[example]] sentence. It is [[awesome]]')"
100000 loops, best of 3: 16.9 usec per loop

How about the use of re.sub() and a little regular expression magic:

import re
re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>', "This is an [[example]] sentence. It is [[awesome]]");

This code allows you to extend the replacements list at will.

import re

_replacements = {
    '[[': '<b>',
    ']]': '</b>',
    '{{': '<i>',
    '}}': '</i>',
}

def _do_replace(match):
    return _replacements.get(match.group(0))

def replace_tags(text, _re=re.compile('|'.join(re.escape(r) for r in _replacements))):
    return _re.sub(_do_replace, text)

print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

This is an <b>example</b> sentence. It is <b><i>awesome</i></b>.

... the advantage of using the regex approach here might be that it prevents doing substitutions when the source text doesn't have matching pairs of [[ and ]].

Maybe important, maybe not.

The methods suggested by the other posters will certainly work, however I would like to point out that using Regular Expresions for this task will incur quite a performance hit.

The example you supplied could be solved just as well using native Python string operations, and would perform roughly 3 times faster.

For instance:

>>> import timeit
>>> st = 's = "This is an [[example]] sentence. It is [[awesome]]"'
>>> t = timeit.Timer('s.replace("[[","<b>").replace("]]","</b>")',st)
>>> t.timeit()  # Run 1000000 times
1.1733845739904609
>>> tr = timeit.Timer("re.sub(r'\[\[(.*?)\]\]', r'<b>\1</b>',s)",'import re; ' + st)
>>> tr.timeit() # Run 1000000 times
3.7482673050677704
>>>

Hope this helps :)

继续阅读：python regex

Python: Replace tags but preserve inner text?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？