开发者

Python regex sub

I want to delete all comment. This is my regular expression :

re.sub(re.compile('<!--.*-->', re.DOTALL),'', text)

But if my text is :

bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu

the result is :

bzzzzzz blublu

instead of :

bzzzzzz blibli blublu

Thanks for yo开发者_Go百科ur help


I'd suggest not to use regex for this kind of stuff. There is always a better solution, such as lxml.html.clean.

Your example:

import lxml.html.clean as clean
cleaner = clean.Cleaner(comments=True)
cleaner.clean_html("bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu")
#'bzzzzzz  blibli  blublu'


* is greedy while *? is not

re.sub(re.compile('<!--.*?-->', re.DOTALL), '', text)

or, even shorter:

re.sub('(?s)<!--.*?-->', '', text)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜