Python: Check the value of a variable passed as a parameter in another method?
Somewhat related to my earlier qu开发者_开发问答estion. I'm making a simple html parser to play around with in Python 2.7. I would like to have multiple parse types, IE can parse for links, script tags, images, ect. I'm using the HTMLParser module, so my initial thoughts were just make a separate class for each thing I want to parse. But that seemed rather silly. Is there a way to go about doing this without creating multiple classes? I am more familar with C#, so I figured I'd just pass a parameter on the init method to specify what exactly to parse for, just like I would in .Net, however I don't seem to be doing it correctly. It doesn't work, and it just doesn't 'look' right. Here's the current working code: How would I modify this to I can just have the one class, and the parameters that are passed indicate the type of HTML tags to parse?
class LinksParser(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
req = urllib2.urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag !='a': return
for name, value in attrs:
print("Found Link --> [{0}]{1}".format(name, value))
class TagParser(HTMLParser):
def __init__(self, url, tag):
HTMLParser.__init__(self)
self.tag = tag
req = urllib2.urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag != self.tag: return
for name, value in attrs:
print("Found Tag({2}) --> [{0}]{1}".format(name, value, self.tag))
Something like that:
class MyParser(HTMLParser):
def __init__(self, url, tags):
HTMLParser.__init__(self)
self.tags = tags
req = urllib2.urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag not in self.tags: return
for name, value in attrs:
print("Found Tag --> [{0}]{1}".format(name, value))
instantiate the class with something like:
p = MyParser("http://www.google.com", [ 'a', 'img' ])
精彩评论