开发者

python regex problem

s = re.sub(r"<style.*?</style>", "", s)

Isn't this code supposed to remove styles in the s string? Why does it not work? I am trying to remove the following开发者_Python百科 code:

<style type="text/css">
body { ... }
</style>

Any suggestion?


No it's the re.DOTALL flag that is necessary !

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

http://docs.python.org/library/re.html#re.DOTALL

Edit

In some cases, it may be necessary to have a dot matching all characters (newlines comprised) in a region of a string, and to have a dot matching only non newlines characters in another region of the sting. But using flag re.DOTALL doesn't allow this.

In this case, it's usefull to know the following trick: using [\s\S] to symbolize every character

import re

s = '''alhambra
<style type="text/css">
body { ... }
</style>
toromizuXXXXXXXX
YYYYYYYYYYYYYY'''
print s,'\n'

regx = re.compile("<style[\s\S]*?</style>|(?<=ro)mizu.+")

s = regx.sub('AAA',s)
print s

result

alhambra
<style type="text/css">
body { ... }
</style>
toromizuXXXXXXXX
YYYYYYYYYYYYYY 

alhambra
AAA
toroAAA
YYYYYYYYYYYYYY
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜