Why doesn't my Django site return a 404 when checked with this URL parser?
Here's a simple python function that chec开发者_开发知识库ks if a given url is valid:
from httplib import HTTP
from urlparse import urlparse
def checkURL(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
if h.getreply()[0] == 200:
return 1
else: return 0
This works for most sites, but with my Django-based site I get 200 status code even when I enter a url that is clearly wrong. If I view the same page in a browser, I get a 404. For example, the following page gives a 404 in a browser: http://wefoundland.com/GooseBumper
But gives a 200 when checked with this script. Why?
Edit: While mopoke's answer solved the issue from the Django side of things, there was also a bug in the script above:
instead of parsing the url and then using
h.putrequest('HEAD', p[2])
I actually needed to use the url in the request, like so:
h.putrequest('HEAD', url)
that solved the issue.
Although the content says 404, the site is returning 200 OK in the headers:
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:38:24 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Make sure your response is using HttpResponseNotFound
. e.g.:
return HttpResponseNotFound('<h1>Page not found</h1>')
Your page isn't actually returning a 404 status code:
alex@alex-laptop:~$ curl -I http://wefoundland.com/GooseBumper
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 30 Dec 2009 01:37:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
To get a 404 to be returned by your Django view, use HttpResponseNotFound instead of HttpResponse, or pass in 'status=404' to the HttpResponse constructor.
精彩评论