When to show 404 versus 301?
So I've been noticing some strange results in how google peruses our site. One issue is that a url such as this:
http://example.com/randomstring
is showing up on google with all of the data of
http://example.com/
So in my mind there are two solutions. One is to add a 301 redirect whenever someone visits a sub-url of the main one, and redirect them to the parent URL, or just give a 404, with a nice message saying, "Maybe you meant parent-url".
Thoughts? I'm p开发者_开发百科retty sure I know where I want to send them, but what is the proper web-etiquette? 404 or 301?
The correct http way would be a 404, as long as a request is made to something that doesn't exist.
301 is for something that is moved, which is not the case here.
However, 100% correct http convention is rarely followed today. Depending on the context it could be useful to redirect the user to the home page with a notification that the page wasn't found and that they were redirected. Though in this case you should use a 303 See Other
code.
You should never redirect without letting the user know that a redirect happened, though. That confuses the user to think that maybe something is wrong.
The already posted answers cover your question nicely but I thought there may be some value in going to the source: rfc 2616
10.3.2 301 Moved Permanently
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request.
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
Of course, with these things it tends to be that the common usage takes precedence over the actual text of the RFC. If the entire world is doing it one way, pointing at a document doesn't help much.
I'd say a 404 is the right thing to do, as there never was a meaningful resource at the location, so nothing has "moved permanently" (which is the meaning of 301) and the client needs to know their URL was faulty and has not just changed in the meantime.
But I don't quite understand yet what the issue is. Is Google hitting your site with random URL requests? That would be odd. Or is it that your site is showing the same results for domain.com/randomstring
as for domain.com/index.html
? That you should change, methinks with a 404.
If you know what URL they should go to, that's exactly what 301 is for.
So are you saying that your site is doing redirects without your control?
When you want to use a 301 (permanent redirect) is when that page originally existed but has moved somewhere else. It's a "Change of Address Card". Huge lifesaver when restructuring a site. If the page is just some wacky random URL, then passing a 404 tells spiders (and humans too but people do this less) that this page never existed so don't keep coming back and wasting my web-servers time. Some people disagree with this because they never want their users to see a 404 page. I think these codes were developed for good reason and are used pretty well by Search Engines.
Passing either of these status codes does not prevent you from serving "friendly pages" (although a 301 will typically just redirect you if the browser allows).
The thing to remember is that Google doesn't like duplicate content, so you want to make sure that your site does not appear to be serving the same content with different URL's.
精彩评论