What happens to the variable in this case when it is overwritten with self?

2023-02-13 06:52 问答作者：

I am downloading URLs in Python and need to detect 404s, so after some search I came up with:

import urllib
class MyUrlOpener(urllib.FancyURLopener):
    def retrieve(self, url, filename=None, reporthook=None, data=None):
        self.file_was_found = True
        val = urllib.FancyURLopener.retrieve(self, url, filename, reporthook, data)        
        return val

    def http_error_404(url, fp, errcode, errmsg, headers, data):
        url.file_was_found = False


def download_file(url, saveas):
    urlaccess = MyUrlOpener()
    localFile, headers = urlaccess.retrieve(url, saveas)
    return urlaccess.file_was_found

My question is that if you look at the source code (Python 2.7) for FancyURLopener then you see:

def http_error(self, url, fp, errcode, errmsg, headers, 开发者_运维知识库data=None):
    """Handle http errors.
    Derived class can override this, or provide specific handlers
    named http_error_DDD where DDD is the 3-digit error code."""
    # First check if there's a specific handler for this error
    name = 'http_error_%d' % errcode
    if hasattr(self, name):
        method = getattr(self, name)
        if data is None:
            result = method(url, fp, errcode, errmsg, headers)
        else:
            result = method(url, fp, errcode, errmsg, headers, data)
        if result: return result
    return self.http_error_default(url, fp, errcode, errmsg, headers)

Which is passing the url as the first parameter and not self. I thought that the first parameter to a function was always a reference to the class instance (by convention) and my code confirms this. So what happens to the url value?

UPDATE: It turns out that data==None so it was calling the first signature. This foiled my attempts to manually add the self parameter. As soon as I added the =None default to data in my http_error_404 signature all was well (because it used the default).

The fixed / correct signature is def http_error_404(self, url, fp, errcode, errmsg, headers, data=None):

In Python, any class instance's method has self passed in by the Python interpreter and all of the other arguments are shifted down one place automatically.

In other words the Python interpreter rewrites:

urlaccess.retrieve(url, saveas)

into something that looks like this:

urlaccess.retrieve(urlaccess, url, saveas)

So you don't have to do it yourself. However, since

explicit is better than implicit

any instance methods you declare for a Python object must specify explicitly that they take the instance of the object as their first argument even though Python will pass that argument without any action on the part of the programmer.

The first argument does not have to be called self ... that is only a convention.

So, to actually answer your question though (as mluebke did) -- you need to specify the self argument.

def http_error_404(url, fp, errcode, errmsg, headers, data):
    url.file_was_found = False
    # Python is treating `url` as `self`
    # Therefore the URL is being saved in `fp`, `fp` in `errcode`, etc.

To fix this problem add a first argument to pick up the instance.

def http_error_404(self, url, fp, errcode, errmsg, headers, data):
    self.file_was_found = False
    # Now everything should work

self is explicitly listed in the method definition, but implicitly passed when the method is called. Change your function to look like this and all your variables will start to line up again.

def http_error_404(self, url, fp, errcode, errmsg, headers, data):
    self.file_was_found = False

继续阅读：python

What happens to the variable in this case when it is overwritten with self?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？