HTMLParser and weird behavior
I have to extract an information from the following web page with Python 3: http://www.homefinance.nl/english/international-interest-rates/libor/libor-interest-rates-gbp.asp
The download using urllib.request seems ok, but surprisingly, when I parse the html file with my HTMLParser class the parsing seems to stop in the middle of the meta tags, without giving any rationales.
This is my code:
import urllib.request
from html.parser import HTMLParser
def downloadLIBOR():
html_file = urllib.request.urlopen("http://www.homefinance.nl/english/international-interest-rates/libor/libor-interest-rates-gbp.asp")
return html_file
class tmpHTMLParser(HTMLParser):
def __init__(self):
self._libor = "0.81625 %"
self._stack = []
self._properStack = []
super().__init__()
def handle_starttag(self, tag, attrs):
print("starttag " + str(tag))
print(self.get_starttag_text())
self._stack.append(tag)
def handle_startendtag(self, tag, attrs):
prin开发者_高级运维t("startendtag")
def unknown_decl(self, data):
print("unknown_decl")
def handle_endtag(self, tag):
print("endtag " + str(tag))
self._stack.pop()
def _buildProperStack(webpage):
"""dev tool: return the stack leading to the libor rate libor into the webpage webpage."""
parser = tmpHTMLParser()
parser.feed(webpage)
if __name__ == "__main__":
webpage = downloadLIBOR()
print("download done")
html = str(webpage.read())
_buildProperStack(html)
exit(0)
BTW, I noticed that you forgot to do a parser.close() after the parser.feed(). It might be buffering something, and the close will force it to finish.
Not sure what you are actually trying to do but using BeautifulSoup for parsing HTML is much nicer and easier and less error-prone.
精彩评论