Python: How to delete a HTML header from a text string? [duplicate]
Possible Duplicate:
using python, Remove HTML tags/formatting from a string
I read in a HTML file:
fi = open("Tree.html", "r")
text = fi.read()
I want to delete the HTML header from the text:
text = re.sub("<head>.*?</head>", "", text)
Why does this not work?
It looks like you're not catching newlines. You need to add the DOTALL flag.
text = re.sub("<head>.*?</head>", "", text, flags=re.DOTALL)
精彩评论