Python regex sub
I want to delete all comment. This is my regular expression :
re.sub(re.compile('<!--.*-->', re.DOTALL),'', text)
But if my text is :
bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu
the result is :
bzzzzzz blublu
instead of :
bzzzzzz blibli blublu
Thanks for yo开发者_Go百科ur help
I'd suggest not to use regex for this kind of stuff. There is always a better solution, such as lxml.html.clean.
Your example:
import lxml.html.clean as clean
cleaner = clean.Cleaner(comments=True)
cleaner.clean_html("bzzzzzz <!-- blabla --> blibli <!-- bloblo --> blublu")
#'bzzzzzz blibli blublu'
*
is greedy while *?
is not
re.sub(re.compile('<!--.*?-->', re.DOTALL), '', text)
or, even shorter:
re.sub('(?s)<!--.*?-->', '', text)
精彩评论