开发者

Why doesn't my Django site return a 404 when checked with this URL parser?

Here's a simple python function that chec开发者_开发知识库ks if a given url is valid:

from httplib import HTTP
from urlparse import urlparse

def checkURL(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    if h.getreply()[0] == 200:
        return 1
    else: return 0

This works for most sites, but with my Django-based site I get 200 status code even when I enter a url that is clearly wrong. If I view the same page in a browser, I get a 404. For example, the following page gives a 404 in a browser: http://wefoundland.com/GooseBumper

But gives a 200 when checked with this script. Why?

Edit: While mopoke's answer solved the issue from the Django side of things, there was also a bug in the script above:

instead of parsing the url and then using

 h.putrequest('HEAD', p[2])

I actually needed to use the url in the request, like so:

h.putrequest('HEAD', url)

that solved the issue.


Although the content says 404, the site is returning 200 OK in the headers:

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:38:24 GMT
Content-Type: text/html; charset=utf-8
Connection: close

Make sure your response is using HttpResponseNotFound. e.g.:

    return HttpResponseNotFound('<h1>Page not found</h1>')


Your page isn't actually returning a 404 status code:

alex@alex-laptop:~$ curl -I http://wefoundland.com/GooseBumper
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:37:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive


To get a 404 to be returned by your Django view, use HttpResponseNotFound instead of HttpResponse, or pass in 'status=404' to the HttpResponse constructor.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜