开发者

Python: Check the value of a variable passed as a parameter in another method?

Somewhat related to my earlier qu开发者_开发问答estion. I'm making a simple html parser to play around with in Python 2.7. I would like to have multiple parse types, IE can parse for links, script tags, images, ect. I'm using the HTMLParser module, so my initial thoughts were just make a separate class for each thing I want to parse. But that seemed rather silly. Is there a way to go about doing this without creating multiple classes? I am more familar with C#, so I figured I'd just pass a parameter on the init method to specify what exactly to parse for, just like I would in .Net, however I don't seem to be doing it correctly. It doesn't work, and it just doesn't 'look' right. Here's the current working code: How would I modify this to I can just have the one class, and the parameters that are passed indicate the type of HTML tags to parse?

class LinksParser(HTMLParser):
  def __init__(self, url):
    HTMLParser.__init__(self)
    req = urllib2.urlopen(url)
    self.feed(req.read())

  def handle_starttag(self, tag, attrs):
    if tag !='a': return
    for name, value in attrs:
      print("Found Link --> [{0}]{1}".format(name, value))


class TagParser(HTMLParser):

    def __init__(self, url, tag):
        HTMLParser.__init__(self)
        self.tag = tag
        req = urllib2.urlopen(url)
        self.feed(req.read())

    def handle_starttag(self, tag, attrs):
        if tag != self.tag: return
        for name, value in attrs:
            print("Found Tag({2}) --> [{0}]{1}".format(name, value, self.tag))


Something like that:

class MyParser(HTMLParser):
    def __init__(self, url, tags):
        HTMLParser.__init__(self)
        self.tags = tags
        req = urllib2.urlopen(url)
        self.feed(req.read())

    def handle_starttag(self, tag, attrs):
        if tag not in self.tags: return
        for name, value in attrs:
            print("Found Tag --> [{0}]{1}".format(name, value))

instantiate the class with something like:

p = MyParser("http://www.google.com", [ 'a', 'img' ])
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜