Cleaning up 404 Errors with 301 mod rewrite or a pretty solution
I have multiple web sites for my clients and each client has a directory labelled articles. I just inherited this system and I until I can fix the issue I found, I am looking for stopgap solution, one that will eliminate the 404 errors after a file has been deleted.
All these directories have static pages for the articles, as well as an index page that lists all the articles.
Based on the logs it generates many errors from over the years. I can just imagine it is causing havoc the search engines as well. With little knowledge of mod rewrite that I have, I managed to put this together which I plan to put into place within the Apache configuration. Before I do, is this good solution or is there something else I should do.
<Directory "/home/www/public_html/clients">
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/articles/index.html [R=301,L]
</Directory>
RewriteCond %{REQUEST_FILENAME} !-f
is looking to see if the file exists and if does display it while ignoring the rest of rewrite.
RewriteCond %{REQUEST_FILENAME} !-d
is 开发者_运维技巧looking to see if the directory exists and if does display it while ignoring the rest of rewrite.
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
grabs the domain to passes it to the last rewrite as 301 redirect.
I have it working locally and like a few opinions before making live.
No need for
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
line -- just replace your%1
in RewriteRule by%{HTTP_HOST}
From rewriting point of view the solution is OK
From SEO point of view -- not so sure -- better have it 404 or 410 instead (since the article is no longer there). I think it will be better to display custom page to the customer while sending 404 or 410 to the browser:
- Browser/Search Engine will see error code
- User will see explanation instead while offered to see related to the requested URL pages/short index.
From User point of view -- not good: I would like to know that URL/article is no longer available straight away (see #3) and browse your site around if I find it useful, rather than seeing some irrelevant (at first) index page and telling myself -- I do not remember clicking this link, and go back to search engine/referral and click again. If I will see the same index page again -- I understand (most likely) that something wrong with that page and just turn away (unless I really interested in that page's content).
UPDATE:
I would do it this way:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} ^/articles/ [NC]
RewriteRule .* /articles/notfound.php?url=%{REQUEST_URI} [L]
Redirect non-existing URLs to notfound.php (or whatever else name it may have) ONLY if requested URL has anything to do with articles (URL starts with
/articles/
)On that page (has to be dynamic (PHP or similar) and not static HTML) respond with
410 Gone
Error Code (for browser/spider) and display a page explaining that this URL is no longer here but you can look at these links (and some useful links -- could be mini-index/recent articles etc) -- this is for user.
Seems to be valid, at least it should work :-)
About the 301 redirect you make a permanent redirect from an article page to a page that, if I understand well, list available articles. A better HTTP code exist for vanished ressources it's 410 Gone
:
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
So it would be a 'better 404', that maybe crawlers will prefer, at least it's the HTTP way. If you really want a Redirect behaviour (maybe better for humans) then the 301 is the right choice, but crawlers may detect a lot of previous ressources links to the same new content. On the other hand it's something often done, so I'm quite sure you won't have any problems.
301 vs 410 is hard to decide actually. There's alos the Redirect 303 See Other
which is a ...redirect, but sadly the main goal of 303 is more a redirect after post than a 410-with-redirect. With a 303 the old url is not removed from search indexes.
Last problem, all random url (legitime 404) will get a response (301+200 or 410), which can lead to false positive for somme fuzzy attackers in the redirect case... but false positive is maybe a good thing for theses scripts, they'll loose time on that.
精彩评论