开发者

Why does http://google.com/a/bogus/url not redirect to a 404 URL, and why is this preferred?

  1. Go to http://google.com/a/bogus/url
  2. You will see a 404 page (HTTP code is 404)
  3. But the url in your browser remains as http://google.com/a/bogus/url

Why?

Why is this behavior preferred over redirecting to a single 404 page URL such as http://google.com/pagenotfound or something like that?

Background

I first noticed this behavior in a Drupal site that we implemented. If you visit a non existing page i.e.: http://mysite.com/a/bogus/url/ it displays a "page not found" message. If you fetch the page with TELNET, you see that the correct HTTP 404 error code is returned, but the URL is not rewritten. I was shocked by it so I was opening a bug report at Drupal.org. While writing down the bug report, I wanted to use Google as an example of what (I thought) should happen. To my dismay, Google does the same thing!

Why do you care? You presumably ask. Well, let's say that I have Google Analytics installed on my website. If the 404 page URL was rewritten as I was expect开发者_开发百科ing, then I should be able to run a report and see how many times my visitors have seen my one and only 404 page. I could then see where they are coming from and hopefully find the offending link.

As it stands right now, the Google Analytics script will execute from http://mysite.com/a/bogus/url/ and will happily report that someone just saw this page. How then am I supposed to know when someone has seen a 404 page? I'm not really looking for an answer to my particular programming question, but rather an insight into why redirection is not a common practice.

Any thoughts would be greatly appreciated.


Returning a redirect to a page with an error message is incorrect. You're telling the client that the page does exist, at a different address, and then telling them it didn't after all. Or, even worse, and very commonly, your error page is returned as a 200 OK response so you're claiming a page does exist when it doesn't.

This slows down browsers, forcing them to make an extra completely unnecessary request, and can confuse automated tools. It also means that if you subsequently put a file at the address a/bogus/url, the user won't be able to hit reload to get it, as they'll have ended up at an address that only ever shows an error. This can also play badly with caches as the redirect response may be cacheable.

Though the case for 404s is not nearly as bad as the equally-common mistake of redirecting all server-side errors (500) to a separate error page address.

Using redirects instead of just returning a different page in response is typically an artefact of server-side scripting languages that tie the incoming URL to the view, like .asp(*). Google aren't using a framework that requires them to specify what view will be returned in the URL, so they don't need to issue a redirect. They can do it the more efficient and correct way.

(*: though even in ASP[.NET] you can Server.Transfer to push to a different page without issuing a redirect. It's not such a common practice amongst ASP coders, unfortunately; there is a cultural preference—I would call it a disease—for redirects, that often ends up causing horrible redirect loops and debugging woe.)

How then am I supposed to know when someone has seen a 404 page?

Any decent web log analyser will allow you to search based on HTTP response. In fact you will get better, more accurate results that way, and you would be given the exact incorrect link in every case, which wouldn't happen with the redirect.

why redirection is not a common practice.

It is a common practice. It is a common wrong practice, to be avoided.


Why is this behavior preferred over redirecting to a single 404 page URL such as http://google.com/pagenotfound or something like that?

Someone may have made a typo. And it would be nice to know / see if this was the case.

Also:

If the 404 page URL was rewritten as I was expecting, then I should be able to run a report and see how many times my visitors have seen my one and only 404 page.

Aren't the 404's in the webserver's logs?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜